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Furope should retain its ambition 


The EU is planning anew round of billion-euro Flagship research projects, learning from its rocky 
experience so far. It must be open to all types of science. 


obviously by this year’s Brexit vote, the European Commis- 

sion still has big ambitions for the continent's science. Prepa- 
rations for the EU’s ninth framework research programme (FP9), to 
be launched in 2021, will begin in earnest next year. And, surprisingly 
perhaps given recent history, it will include another couple of ten-year, 
billion-euro FET Flagship projects. 

FET stands for future and emerging technologies, and its flagship 
projects aim to promote digital technologies while addressing policy 
priorities. In commission-speak, they are supposed to be ‘game- 
changing’ and ‘visionary, with the potential to deliver ‘transforma- 
tional economic or social impact. These are lofty but laudable goals 
that deserve support. But, given their immense cost, the projects must 
be carried out in the best way. And the commission is still trying to 
work out what that is. 

To say that the concept got off to a rocky start would be an 
understatement. The Human Brain Project, one of the two initial 
flagships selected in 2013, bitterly divided the European neurosci- 
ence community and had to be restructured after a painful and public 
showdown. And although the second, called Graphene, has trundled 
along peacefully, many scientists and officials across Europe wondered 
if the commission would risk launching any more. In fact, it quietly 
launched a third — a quantum-science project — earlier this year, 
without the razzmatazz of the public competition that accompanied 
the first round. ‘Quantum is considered important to the development 
of the European Open Science Cloud, a virtual environment for stor- 
ing and sharing data, and its various security-related digital ambitions. 
It is due to launch in 2018 (see go.nature.com/1semikt). 


D espite the rising tide of Euroscepticism, exemplified most 


STEERING THE FLAGSHIPS 

On 15 December, member states and European research organizations 
met in Brussels to discuss what comes next. The meeting followed the 
outcome of a public consultation in which scientists were invited to 
submit possible flagship ideas. In the next few months the commission 
will settle on three or four themes; specific calls for proposals will be 
issued in mid-2017. 

Last week's meeting settled on three general themes, which can still 
change or expand. One is energy, environment and climate change; 
another is health and life sciences; and the third, described as ‘ICT for 
the connected society, covers communications technology. Then, as in 
the first round of flagships, five or six winners will be given a generous 
grant to develop their concepts, and one or two of these will be selected 
for funding within the next framework programme. 

Of the two dozen submissions already sent in by scientists, the 
commission felt that 14 ticked the requisite boxes of size and cross- 
nationality, ambition, feasibility and alignment with EU policy pri- 
orities. The most popular ideas — and these offer early clues to how 
the commission will frame its official call for proposals — were in 


health care, climate change and food security, robotics and renewable 
energy. No surprises there. But other persuasive concepts ranged from 

synthetic biology and regenerative medicine to digital humanities. 
How to choose between them? In post-Brexit, post- Human Brain 
Project Brussels, the mood is conciliatory towards member states. Not 
always fairly, the commission, an executive body, is often accused of 
making decisions without appropriate input from national politicians 
and external experts. Certainly, in the creation of its first two flagships, 
the commission gave academic-led consortia too much leeway to define 
their own management structures. This allowed the Human Brain Pro- 
ject temporarily to veer off course, concentrating too much power in 
too few hands and sidelining most of the project scientists. This time, 
the commission has been at pains to consult as 


“The European widely as possible with the scientific commu- 


Commission nity, with research bodies and with industry 
has been at —and most particularly with member states. 
pains to consult The two suggested projects with the most 
as widely as support so far are ‘Future of Healthcare’ and 


‘Robot Companions for Citizens, strong 
proposals that were both finalists in the first 
flagship round. Future of Healthcare is a consortium led by scientists in 
Berlin and Lausanne, Switzerland. It aims to develop the technologies 
and the legal and regulatory environment for a continent-wide digital 
health-care system, as well as the integrated molecular and imaging 
technologies that will help promote the development of personalized 
medicine. Robot Companions is led by scientists in Genoa, Italy, and 
will exploit multiple disciplines, from artificial intelligence to cogni- 
tive sciences, to develop soft-bodied, ‘perceptive’ robots for those who 
are old and lonely; it will also design robots that can help in surgical 
procedures, farming and other areas. 

Some at last week's meeting said that scientists involved in related 
consortia should pitch a joint idea. That is not a good idea. As UK rep- 
resentatives warned, doing so could blur goals and create consortia too 
large to manage projects efficiently. Alas, the views of the United King- 
domare of diminishing importance in Europe since the Brexit vote. But 
the commission should think carefully about forcing such marriages — 
which would also prevent groups from competing against each other. 

It should also assess the extent to which it will unquestioningly 
follow the views of the member states — whose national interests 
sometimes conflict with the broad European goals they have signed up 
to. Governments will, of course, lobby for money to be spent on health 
care and robots, which yield tangible and vote-winning outcomes. 
The case for massive resources for cross-national digital humanities 
projects — which aim to preserve shared history and culture — is a 
harder sell. That is why the commission must explicitly make room for 
them to compete when it sends out its funding call. Cooperation and 
shared values may be falling out of political fashion, but they remain 
the bedrock of the European project. = 


possible.” 
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Snow blind 


Have a bet ona white Christmas, but don’t fall 
for anold chestnut. 


Bing Crosby to enjoy a white Christmas. Bookmakers are a more 

cynical bunch, so usually demand to see at least a single white 
flake fall during the 24 hours of 25 December. The chances of snow at 
Nature HQ this weekend are diminishing as Britain basks in unseason- 
ably mild conditions — bookmakers put the odds of an official white 
Christmas in London at about 10 to 1. 

If that seems too long a shot, then science offers a way to make 
the bet more attractive to punters. By combining it with a second bet 
on an event much more likely to happen, bookies can exploit a psy- 
chological tic called the conjunction fallacy. Odds, for example, of 
a cover of the Rolling Stones’ classic “You can’t always get what you 
want’ reaching the UK Christmas number-one slot are a much shorter 
9/4 — it’s one of the favourites. (All odds correct as Nature went to 
press.) And although logic and statistics tell us that the chance of both 
events occurring must be lower than either of the single events alone, 
gamblers routinely fail to recognize that. 

Study after study shows that pairing with a dead-cert makes an 
unlikely wager seem more — not less — likely to happen. And that 
makes people more willing to put money on an outsider. This logical 
illusion can explain much fixed-odds betting on sport, including foot- 
ball. Gamblers routinely think there is more chance that West Bromwich 
Albion will win at Arsenal on Boxing Day (9/1) if the wager is combined 
with a Manchester United home victory over Sunderland (2/9). 

Exactly why this happens is not clear, but it seems that some 
gamblers play the odds off against each other in their heads, and 
assume — incorrectly — that the combined chance of the two is 
an average of the odds, that the extreme likelihood of the second 
option somehow tempers the outlandishness of the first. It can be an 
expensive mistake. 

With supreme knowledge of the human condition, one might think 
that scientists would be immune from making rash bets. Not so. This 


[: just took some treetops to glisten and some children to listen for 


year, astrophysicist Shrinivas Kulkarni has lost a US$1,000 wager on 
the origins of fast radio bursts, and another astrophysicist, David 
Wiltshire, has stumped up $200 for a lamp after losing a 10-year 
wager with a colleague on the role of the cosmological constant in 
dark energy. 

ALthough such simple bets between researchers (sometimes 
friendly and sometimes not so) are a long-standing feature of science, 
perhaps the most lucrative are those in which scientists (just like book- 
makers) pit their calculated professionalism against the optimism and 
emotion of those who follow a lost cause. This year has also seen cli- 
mate scientists cash in on bets made with sceptics about the continued 
warming of the planet. Indeed, the annual meeting of the American 

Geophysical Union last week had a session 


“Bookies dedicated to betting on climate change. 

can exploit a Bets accepted or refused can be a good way 
psychological to gauge how firmly a sceptic truly believes 
tic called the their contrarian position, because wagers 
conjunction typically follow strongly and honestly held 
fallacy. ad (however unlikely) opinions. (Hence, some 


fans of Sunderland will see the odds of 14/1 
on them winning the above match as too good to turn down.) 

Some events baffle punters, scientists and bookmakers alike — and 
2016 has seen plenty of those. So who would dare to argue that a theo- 
retical ‘social-physics’ model — used, among other things, to predict 
the behaviour of plastic crystals — would do a worse job than poll- 
sters and experts at predicting the results of political votes such as this 
year’s Brexit referendum and US presidential election? Physicists last 
month published such a mean-field model, which they say describes 
the dynamics of two-group conflicts on the basis of the interactions 
between group members, opponents and how willing people are to 
change their minds (H. T. Diep et al. Physica A 469, 183-199; 2017). 

The model’s output shows whether each side in a political dispute 
will tend towards negotiation or conflict, and the often wild swings 
and oscillations in their attitudes towards each outcome along the 
way. It is not a tool of prediction, the physicists caution, but rather 
one of anticipation for strategic purposes. That seems a sensible 
approach given recent events, which have shaken faith in predic- 
tions of all sorts. So in that spirit, as 2016 draws to an end and as 
Bing almost sang: your days may be merry and bright, and all your 
Christmases may be white. = 


On retirement 


When great colleagues end their careers, 
employers should recognize their value. 


people who retire can experience up to six separate statuses. 

Retrenchment comes when they cut backon workand principal 
employment, and Exploration sees them think about what activities 
to do next instead. In their Try-out status, retirees see how well suited 
they are to new activities (including inactivity), and Involvement 
marks their long-term participation in pursuits they enjoy and can 
stick with. When new options present themselves, retirees are faced 
with Reconsideration. And should they move on from an activity, 
or indeed return to work, then they Exit. Not coincidentally, the six 
statuses together form the acronym RETIRE (D. B. Hershenson 
J. Aging Studies 38, 1-5; 2016). 

It may seem contrived, but the study of retirement — and finding 
ways to investigate it — is an important business as the population 
ages. Not least is the question of who should be paid to retire, by whom 
and when. As funds dwindle, retirement ages are creeping up. But 


A ccording to the US psychologist David Hershenson, 
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do some workers deserve an earlier break from the daily grind than 
others? The government of the Netherlands has put some serious 
thought into whether people in some professions — particularly 
occupations involving heavy manual labour, such as construction — 
should have their retirement age fixed or reduced, even while people 
in less-demanding jobs see their retirement ages rise (N. Vermeer et al. 
Labour Econ. 43, 159-170; 2016). In the United Kingdom, the oppo- 
sition Labour Party leader Jeremy Corbyn has suggested something 
similar. As Jane Austen wrote in Sense and Sensibility: “It ist what we 
say or think that defines us, but what we do.” 

Policies on retirement, then, and the studies that inform them, need 
to broaden their assessment to include an earlier status: work. If retire- 
ment is a well-earned break after a long and productive career, how 
can researchers distinguish those employees who should enter it before 
some and after others? 

At Nature we have our own internal scoring system, with its own 
(slightly) contrived acronym. We look for people who perform Work 
that is consistently Excellent and Notable, and that has helped to define 
the cultural and scientific Zeitgeist for a significant time — usually 
measured at about 40 years. We call it the WENZ measure. Every 
organization should have a WENZ. And when people with the WENZ 
factor retire, they should do so in the full and certain knowledge that 
their contribution has been valued by colleagues. More than that, they 
should know they will be missed. m 
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WORLD VIEW yennsicosscn 


hen I moved from Massachusetts almost a decade ago to 
Wes at Oklahoma State University, many colleagues were 

afraid for my career. I work on the human dimensions of 
global environmental change, and Oklahoma has a long and complex 
history with science, including climate change. 

Oklahoma was the first state to ratify ‘anti-Darwir legislation in 
1923 and today is home to key sceptics in the war on climate change, 
including Republican Senator James Inhofe and Scott Pruitt, the state’s 
attorney-general, who earlier this month was nominated to run the US 
Environmental Protection Agency. These politicized debates trickle 
down, and both evolution and human-induced climate change remain 
contested topics, especially in schools. 

However, Oklahoma is also the home of protest singer Woody 
Guthrie, a visible example of resistance in the 
1930s class and culture wars between rural and 
urban values. If Woody could use his voice to 
speak up, so can scientists. 

In truth, my career is fine, and my colleagues 
are supportive. I not only manage, but also thrive. 
And if I can, then so can other scientists who find 
themselves concerned about the tidal wave of 
climate scepticism that comes with last month’s 
election of Donald Trump and his associates. 
The election might have powerful effects on 
science, policy and funding. But I want to stress 
the power and promise of human agency. 

In my case, adjustments are minor, but might 
seem substantial elsewhere. I realize that in my 
day-to-day actions in the classroom and in my 
research with family farmers and ranchers, I 
probably hold a minority viewpoint on human-induced climate 
change. In the classroom, I am sensitive to the fact that many of my 
students have family ties to the oil and gas industry. I regularly see 
them struggle with the local contradictions. I try to create a place of 
mutual respect to embrace this struggle on their own terms, while also 
trying to focus on our role as global citizens facing global challenges. 
It is not always an easy balancing act; these experiences have taught 
me that most students care about global environmental change, but 
often have little previous exposure to such issues — in part because of 
the decisions of local politicians and school boards. In our debriefing 
at the end of the semester, students often express frustration that they 
werent exposed to many of the issues surrounding climate change at 
a younger age. 

[also learned that actively listening to (instead of talking at) farmers 
and ranchers who care about sustaining their land and livelihoods is 
a good way to open dialogue. We can then find common ground on 
pressing environmental issues, such as the depletion of the Ogallala 
Aquifer, encroachment of invasive and nuisance woody-plant species 
on pasture lands, and the compounding impacts of long-term cyclical 


WHERE THERE IS 


CLIMATE WAR, 


THERE IS ALSO 
CLIMATE 
RESISTANCE, 


IN LARGE AND 
SMALL WAYS. 


How Woody Guthrie can 
help us fight for science 


After the election of Donald Trump, Jacqueline M. Vadjunec offers a message 
of resistance and hope from deep within the US Bible Belt. 


drought. People in Oklahoma care about the long-term sustainability 
of their natural resources, but they often use language that is different 
from that of climate scientists and elected officials. 

We should remember the power of the small. In Weapons of the Weak 
(Yale Univ. Press, 1985), James C. Scott illustrates the power of “everyday 
forms of resistance” It is through these small acts (both intentional and 
unintentional) that power can be contested, destabilized and renegoti- 
ated. There may be increased climate scepticism, but there will also be 
more scientists, teachers and citizens banding together to respond. 

Despite official policies that limit climate-change education, a recent 
survey of 115 science teachers in Oklahoma showed that more than 
80% teach climate change in state schools, either formally or informally 
(N. M. Colston and T. A. Ivey J. Educ. Policy 30, 773-795; 2015). Faced 
with few locally available teaching resources, 
most teachers write their own lesson plans. They 
also take advantage of ‘teach the controversy’ 
campaigns — intended by some to undermine 
the scientific consensus — to introduce students 
to locally controversial topics such as human- 
induced climate change, which otherwise might 
be seen as off limits. Consequently, more than 
two-thirds of these teachers say that they expe- 
rience no pushback from students, parents or 
administrators (N. M. Colston and J. M. Vadjunec 
Geoforum 65, 255-265; 2015). 

In resisting the mood of anti-science, research- 
ers need to reach out to a diverse public in more 
accessible ways. We also need to accept different 
ways of knowing or even talking about climate 
change: ways that open doors to start a conver- 
sation; ways that are more context specific, culturally sensitive and 
nuanced than science in general might be comfortable with. 

For example, state politics in Oklahoma are shaped in part by 
continuous interactions with Oklahoma's 39 Tribal Nations. These 
Native Americans, in conjunction with researchers and media artists, 
are speaking up to provide their own unique perspectives on climate 
change (see go.nature.com/2grktji). Such projects show that indig- 
enous people care deeply about climate-change issues, but that when 
it comes to adaptation and mitigation, they would like their traditional 
knowledge to be valued along with that produced by Western science. 

Where there is climate war, there is also climate resistance, in large 
and small ways. I urge scientists not to lose heart, but to develop tools 
and projects that are useful to citizens, as well as to our peers and 
funding agencies. In practising such open-minded science, we might 
find that we have more allies than are visible at first glance. = 


Jacqueline M. Vadjunec is an associate professor in the Department 
of Geography at Oklahoma State University in Stillwater. 
e-mail: jacqueline.vadjunec@okstate.edu 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Penis bone lost 
through evolution 


Our monogamous lifestyle 
may explain why humans, 
unlike many other mammals, 
lack a penis bone. 

The bone, called a baculum, 
rests at the end of the penis 
and is thought to provide 
structural support and prolong 
copulation. Matilda Brindle 
and Christopher Opie at 
University College London 
analysed the size of bacula 
in nearly 2,000 species of 
mammal, including primates 
and carnivores. They found 
that species that copulate for 
longer tend to have longer 
bacula. The same is true of 
animals that have more than 
one mate or have seasonal- 
breeding patterns, which 
lead to intense competition 
between sperm from different 
males after mating. 

The results show that the 
baculum first evolved 
145-95 million years ago, 
in the common ancestor of 
primates and carnivores. It 
disappeared from the human 
lineage after our split with 
chimpanzees, and this may 
have coincided with the switch 
towards a more monogamous 
lifestyle, the authors say. 

Proc. R. Soc. B 283, 20161736 
(2016) 


Device breaks 
cooling record 


A cooling technology, if 
scaled up, could decrease 
temperatures by as much as 
37 °C — potentially boosting 
the capabilities of refrigeration 
equipment. 

Shanhui Fan at Stanford 
University in California and 
his team built a device that 
includes a thermal emitter 
that gives off heat in the 


Smaller monsoon boost predicted 


Climate change may produce smaller-than- 
expected increases in rainfall in the world’s 
monsoon regions over the coming decades, 


thanks to changes in land use. 


More than 70% of the global population live 
in monsoon areas. Benjamin Quesada of the 
Karlsruhe Institute of Technology in Germany 
and his colleagues ran global climate models 
with and without projected deforestation 
and other land-use changes to compare how 


mid-infrared range. Such 
wavelengths correspond to 
the ‘transparency window 

of Earth’s atmosphere, 
allowing the heat radiation 
emitted by the apparatus to be 
released into space. Previous 
systems have reduced their 
temperatures by up to only 
20°C in low-altitude areas 
with moderate humidity 
levels. But the team broke this 
record with the addition of 
several features, including a 
vacuum chamber around the 
emitter. This ensured that heat 
was emitted only into space, 
and not into the surrounding 
air, increasing the amount 


486 | NATURE | VOL 540 | 22/29 DECEMBER 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


the century. 


monsoon patterns might shift by the end of 


The models suggest that monsoon rain will 


generally become more intense in a warming 


of heat removed. 

With further development, 
the technology could 
be used for applications 
including energy-efficient air 
conditioning. 
Nature Commun. 7, 13729 (2016) 


Genes that make 
mice youthful 


Four genes that reprogram 
adult cells into embryonic-like 
stem cells can also reverse 
some signs of ageing. 

The four genes encode 
Yamanaka factors, which 


world. But the projected increase was 30% 
smaller, on average, when the team accounted 
for changes in land use and land cover. Shifts in 
monsoon rainfall might affect regional water 
resources and agricultural yields, the authors say. 
Geophys. Res. Lett. http://doi.org/bv38 (2016) 


are essential for embryonic 
development, but usually 
cause tumours when 
expressed long-term in 
animals. Juan Carlos Izpisua 
Belmonte at the Salk Institute 
in La Jolla, California, and 
his colleagues switched the 
genes on for two days per week 
over several weeks in mice 
that had an ageing disorder 
called progeria. The animals 
lived about 30% longer, and 
showed improvements in 
tissue healing and other 

signs of ageing, such as organ 
failure. In normal aged mice, 
switching on the genes led 

to improved recovery from 


DAVID SANTIAGO GARCIA/ALAMY 


NASA/JPL-CALTECH/UCLA/MPS/DLR/IDA 


J. DUNNE ET AL./NATURE PLANTS 


muscle injury and to other 
signs of youthfulness. The 
mice did not develop cancer. 
The authors link the 
rejuvenation to epigenetic 
remodelling — changes in the 
chemical marks on DNA that 
do not alter its sequence but 
influence gene expression. 
Cell 167, 1719-1733 (2016) 


a 
Targeting host 
genes for therapy 


By inactivating any one of five 
human genes, scientists can 
prevent HIV from entering 
and growing in immune cells. 

Antiviral therapies targeting 
host genes that the virus 
depends on, rather than 
targeting the virus itself, are 
promising because these genes 
do not mutate as frequently 
as viruses do. This could 
avoid the development of 
drug resistance. Bruce Walker 
at the Ragon Institute of 
MGH, MIT and Harvard in 
Cambridge, Massachusetts, 
and his colleagues screened 
the genome of human T cells 
and identified five genes 
not essential to cell survival 
whose inactivation protected 
cells from HIV infection. 
Cultured cells lacking these 
genes resisted HIV infection. 
The genes encode proteins 
that facilitate virus entry into 
human cells, and one that 
mediates cell aggregation, 
which allows the virus to 
spread between cells. 

The authors say their 
approach could also be used 
to find drug targets for other 
pandemic viruses. 

Nature Genet. http://dx.doi. 
org/10.1038/ng.3741 (2016) 


| OCEANSCIENCE 
East Antarctic ice 
melts from below 


Ocean heat is melting a 
floating ice shelf in East 
Antarctica, raising concerns of 
accelerated glacier discharge 
and sea-level rise. 

East Antarctica’s large 
ice sheet was thought to 
be more stable than that of 


West Antarctica. But Stephen 
Rintoul at the University 

of Tasmania in Hobart and 
his colleagues found that 
increasing ocean heat is 
weakening the Totten Ice 
Shelf in East Antarctica. 

The shelf supports glaciers 
whose volume is equivalent to 
3.5 metres of global sea-level 
rise. 

The team analysed 
oceanographic data collected in 
2015 and found deep channels 
at the front of the ice shelf 
through which large volumes 
of temperate deep-ocean water 
flow into the ice shelf’s cavity. 
Sci. Adv. 2, 1601610 (2016) 


Early humans 
cooked vegetables 


Humans cooked and ate a 
variety of plants — mostly 
grasses and aquatic plants — 
as early as 8,200 years ago. 
Scientists have often 
found signs of prehistoric 
meat and milk processing, 
but direct evidence of early 
plant cooking has been rare. 
Richard Evershed at the 
University of Bristol, UK, 
and his colleagues analysed 
residues from 110 ancient 
pottery fragments discovered 
in the Libyan Sahara. The 
pieces were found in a cave 
anda rock shelter, both of 
which also housed well- 
preserved plants several 
thousand years old (legumes 
of Cassia pictured). Besides 
animal fats, the team found 
plant lipids in most of the pots. 
Some pots seem to have 
been used exclusively for fruits 
and seeds, but the team also 
uncovered evidence of leaves 
and stems being cooked. 
Nature Plants http://go.nature. 
com/2gns862 (2016) 
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RESEARCH HIGHLIGHTS BiiiSaiiaae 


CANCER GENETICS 


Why melanoma is 
worse in men 


Differences in the expression 
ofa particular gene could 
explain why men with skin 
cancer tend to have a lower 
survival rate than women. 

The gene, PPP2R3B, 
is expressed from both 
X chromosomes in women 
and from the X and Y 
chromosomes in men. Alan 
Spatz at McGill University 
in Montreal, Canada, and 
his colleagues studied tissue 
samples from people with 
melanoma, and found that 
greater expression of PPP2R3B 
correlated with longer survival 
times. In cultured cells, high 
levels of PPP2R3B expression 
slowed melanoma growth 
by interfering with DNA 
replication and slowing cell 
division. 

Expression of the gene was 
higher in women than men, 
which could explain why 
women with melanoma have 
better clinical outcomes, the 
authors say. 

Sci. Transl. Med. 8, 369ra177 
(2016) 


Where Ceres 
hides its water 


Frozen water has been lurking 
beneath the rocky surface 

of the Solar System's biggest 
asteroid since its birth billions 
of years ago. 

NASAs Dawn spacecraft 
began orbiting Ceres 
(pictured), which is also 
a dwarf planet, in 2015. 

This allowed a team led by 
Thomas Prettyman at the 
Planetary Science Institute in 
Tucson, Arizona, to measure 
hydrogen at the asteroid’s 
surface. Water inside Ceres 
chemically alters the surface, 
leaving a hydrogen imprint 

there. The highest hydrogen 

concentrations appeared at 

mid to high latitudes. 

A second study looked at 
ice trapped in permanently 
shadowed regions of Ceres’ 
surface. Of the 634 craters 


studied, only 10 contained icy 
material, say Thomas Platz 

of the Max Planck Institute 
for Solar System Research 

in Gottingen, Germany, and 
his colleagues. Ceres, like 
Mercury and the Moon, can 
apparently trap frozen water in 
dark areas for long periods of 
time, they add. 

Science http://doi.org/bv3z; 
Nature Astron. 1,0007 (2016) 


GENOME EDITING 


Enzyme switches 
turn CRISPR off 


Inhibitors of a gene-editing 
system called CRISPR-Cas9 
could one day provide a safety 
switch, allowing researchers 
to halt the system’s activity in 
cells. 

CRISPR-Cas9 is a 
naturally occurring bacterial 
defence mechanism that 
scientists have harnessed to 
alter DNA sequences. Alan 
Davidson at the University 
of Toronto in Canada and 
his colleagues searched for 
bacterial proteins that block 
the DNA-slicing Cas9 enzyme 
from the bacterium Neisseria 
meningitidis, and found three 
families of such proteins. 

The inhibitors halted 
CRISPR-Cas9 editing in 
human cells, suggesting that 
they could be used to better 
control genome editing. 
They could be important if 
CRISPR-Cas9 is to be used. 
for gene therapy in people, or 
to edit the genomes of entire 
populations in the wild. 

Cell 167, 1829-1838 (2016) 
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SEVEN DAYS escnnss 


Gene triangle 


Britain’s fertility regulator 
has decided to allow, in 
“certain, specific cases”, the 
birth of babies from embryos 
that have been modified 

to contain three people's 
DNA. On 15 December, the 
UK Human Fertilisation 
and Embryology Authority 
announced that clinics can 
start to apply for licences 

to conduct limited trials of 
the technique, which aims 
to prevent mothers from 
passing on mutations in 
cellular organelles called 
mitochondria. The move 
makes the United Kingdom 
the first country to explicitly 
permit the controversial 
therapy. 


‘Corrosive’ Brexit 
Uncertainty in the wake of 
Britain’s vote to leave the 
European Union is having 
a “corrosive effect” on UK 
science that could cause 


If Trump 
turns off the 
satellites, 
California 
will launch 
its own damn 
satellite. 


California governor 

Jerry Brown responds to 
suggestions that budget 
cuts could threaten 
Earth-observing-satellite 
programmes, ata 
meeting of the American 
Geophysical Union in 
San Francisco, California, 
on 14 December. 


Replica of ice-age cave opens in France 


A replica of Lascaux, a cave in southwestern 
France that is famous for its galleries of ice- 
age paintings, opened its doors to the public 
on 15 December. The original cave has been 
closed to visitors since 1963, after heavy 
tourist traffic caused the stunning paintings, 
estimated to be 18,000 years old, to deteriorate. 


lasting harm to the country’s 
economy, according to an 
inquiry by a committee in the 
House of Lords, the United 
Kingdom's upper house. The 
inquiry’s report, published on 
20 December, underscores 
the importance of freedom of 
movement for EU scientists 
and criticizes ministers for 
sending mixed messages on 
whether immigration rules 
for students will change. To 
strengthen UK science, the 
committee recommends 
finding opportunities to 
establish at least one new 
international research facility, 
and offering compelling 
research-funding and 
settlement packages to attract 
top talent from around the 
world. 
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Mini accelerator 


Physicists are a step closer 

to creating a miniature 
particle accelerator, it was 
announced on 14 December. 
The Advanced Wakefield 
Experiment, or AWAKE, 
based at CERN, Europe’s 
particle-physics laboratory 
near Geneva, Switzerland, 
has yet to accelerate particles. 
But tests in its initial week of 
operation showed for the first 
time that pulses of protons 
can generate the wave of 
plasma needed to do just that. 
Harnessing the effect, which 
had previously been seen 
only in simulations, could 
eventually lead to smaller, 
cheaper particle accelerators. 


The €57-million (US$59-million) centre, 
Lascaux 4, is at the foot of the hills in which the 
original was discovered in 1940 and isa replica 
of almost all of the cave, including its dark and 
damp atmosphere. The first replica of the cave, 
which opened in 1983, featured just the two 
main galleries. 


Telescope setback 
A judge in Hawaii has 
overturned the 2014 state 
approval of the Thirty Meter 
Telescope (TMT) consortium’s 
sublease with the University 
of Hawaii at Hilo, which the 
project needs to build its 
US$1.5-billion instrument on 
Mauna Kea. Plans to build the 
telescope have been mired in 
conflict, but the 15 December 
ruling is a smaller stumbling 
block than the state supreme 
court’s decision a year ago to 
rescind the building permit for 
the project, on which a fresh 
round of hearings is under 
way. The sublease ruling stems 
in part from a legal challenge 
from Native Hawaiians, some 
of whom say that the TMT 
will desecrate sacred land. The 
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telescope has a back-up site in 
the Canary Islands if it cannot 
be built in Hawaii. 


PEOPLE 


Science exile 

Ina display of solidarity with 
troubled particle physicist 
Adléne Hicheur, scientists 
held an international high- 
energy-physics workshop on 
13 December in the small town 
of Vienne, southeast France, 
where Hicheur is under house 
arrest. Hicheur had previously 
been jailed in France for 
alleged terrorism offences — a 
conviction strongly disputed by 
him and his colleagues — and 
after his release in 2012 had 
restarted his research career 

in Brazil. He was mysteriously 
deported from Brazil in July. 
Having renounced his French 
nationality in October, the 
Franco- Algerian physicist 
intends to fly to Algeria within 
two weeks. French authorities 
have agreed to lift the house 
arrest on his departure day. 


Trump energy pick 
US president-elect Donald 
Trump nominated Rick Perry 
to run the US Department of 
Energy on 14 December. Perry 
(pictured) governed Texas 
from 2000 to 2015 and sought 
the Republican presidential 
nomination in 2012. As 
governor, he supported 
fossil-fuel production, and 
questioned the science 


TREND WATCH 


US drug approvals fell by more 
than 50% in 2016, according to 
a 14 December presentation by 


an official at the US Food and 
Drug Administration (FDA). 


The agency has approved 19 new 
drugs so far this year, its lowest 
annual tally in nearly a decade. 
The FDA attributed the decline 
to fewer submissions and the 
approval of five drugs ahead of 
schedule in 2015. The agency 


also rejected more drugs: in 2016, 


61% of the FDA’s decisions were 
approvals, compared with more 
than 95% in 2015. 


underlying climate change. 
Critics have voiced concern 
over his lack of scientific or 


technical background. In 2013, 
he proposed eliminating the 
energy department. Despite 
his ties to the fossil-fuel 
industry, the share of energy 
production from renewables in 
Texas increased substantially 
during his term as governor 
(see page 492). Trump has also 
reportedly selected Montana 
congressman Ryan Zinke 

to head the Department of 

the Interior, which oversees 
federal public lands, natural 
resources and Native American 
programmes. Like Perry, 
Zinke has expressed doubt 
over human-induced climate 
change. He has voted in favour 
of coal extraction and oil and 
gas drilling. Both nominations 
will need confirmation by the 
Senate. 


No Stamina 

The Republic of Georgia has 
banned controversial stem- 
cell entrepreneur Davide 
Vannoni from working in the 


country. In March last year, 
Vannoni was convicted in 
Italy on charges of conspiracy 
and fraud for administering 
unproven stem-cell therapies 
in that country, but his 
sentence was suspended on 
the condition that he halt his 
procedures. In October, Italian 
prosecutors investigated 
allegations that his Stamina 
Foundation was offering 
treatments again, in Tbilisi. 
They sent documentation 
about Vannoni’s case to the 
Georgian government, which 
responded with the ban, 
according to news reports. 


EVENTS 


Polar adventure 

The Antarctic 
Circumnavigation Expedition 
(ACE) set off from Cape Town, 
South Africa, on 20 December 
on a three-month research 
cruise around the frozen 
continent. A 55-strong 
international research crew 
on board the Akademik 
Treshnikov, a Russian research 
vessel chartered for the 
voyage, will collect a variety of 
marine data for studies on the 
impact of climate change in 
the Southern Ocean. Swedish 
philanthropist Frederik 
Paulsen, founder of Ferring 
Pharmaceuticals, is the main 
sponsor of the expedition, 
which has been organized by 
the newly established Swiss 
Polar Institute in Lausanne. 


STEEP DROP IN US DRUG APPROVALS 


The US Food and Drug Administration approved only 19 new 
therapies in 2016, marking a 10-year low. 


@ Approved m Not approved 
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99,006 


The overall number of 
doctorate recipients in the 
United States in 2015, of 
whom 25,403 were female. 
Source: National Science Foundation 


Newton first edition 


A rare copy of Isaac Newton's 
groundbreaking work 
Philosophice Naturalis 
Principia Mathematica has 
become one of the most 
expensive printed science 
books in history. On 

14 December, an anonymous 
bidder paid US$3.7 million for 
a first edition of the book at 
an auction at Christie’s in New 
York City — more than twice 
as much as the auction house 
had expected. First published 
in 1687, the work includes 
Newton's law of universal 
gravitation and his laws of 
motion. A copy that had been 
presented to King James I] of 
England sold for $2.5 million 
in 2013. 


Eczema activity 

On 14 December, the 

US Food and Drug 
Administration approved the 
first new drug to treat eczema 
—a chronic inflammation 

of the skin that causes 

severe itching — in more 
than a decade. The drug, 

an ointment called Eucrisa 
(crisaborole), inhibits the 
protein phosphodiesterase 4 
and was developed by Anacor 
Pharmaceuticals of Palo Alto, 
California. Two days later, 
Swiss drug-manufacturing 
giant Novartis announced its 
intention to buy a separate 
company, Ziarco in Sandwich, 
UK, which is developing an 
oral eczema treatment that 
targets a histamine receptor. 
Novartis did not disclose how 
much it was paying for the 
company. 
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Marine surveillance drones are likely to be among the technologies funded by a new EU research fund. 


FUNDING 


——— Wor. = 


Peaceful EU starts to fund 
military research 


Shift in focus comes in response to a changing world order and the threat of terrorism. 


BY ELIZABETH GIBNEY 


buffeted by a slew of political crises and 
terrorist attacks, the historically civilian 
European Union is bolstering its military capa- 
bilities. And that means making its first major 
investment in military research. 
On 1 December, the European Parliament 
approved a €25-million ($26-million) fund 


3 aced with a changing world order and 


dedicated to military research. It will form 
part of a proposed broader European Defence 
Fund, aimed at making military innovation 
more efficient and enlarging Europe’s indus- 
trial defence base. 

The research portion of the fund will cover 
electronics, advanced materials, encrypted soft- 
ware and robotics. The European Commission, 
the EU’s policymaking arm, expects to invest a 
total of €90 million by 2020. It hopes the figure 


will rise to €500 million a year for defence 
research from 2021. The sum is dwarfed by the 
EU's major research-funding programme, Hori- 
zon 2020, which will hand out €80 billion over 
7 years, the €8.8 billion spent by EU member 
states on defence research in 2014, and what 
the United States and probably China spend on 
defence research (see ‘Military metrics’). 

But some scientists fear that funding defence 
research is a step in the wrong direction for 
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> the EU. “It will necessarily divert much- 
needed funding from civilian R&D budgets, at 
atime when they are urgently needed for areas 
such as climate and energy,’ says Stuart Parkin- 
son, executive director for UK-based advocacy 
group Scientists for Global Responsibility. 

One of the EU’s main objectives is to promote 
peace. In the past, defence was seen as a national 
issue rather than something for the bloc to han- 
dle. The decision to create the research fund is 
in part driven by a drop in national defence- 
research funding, which declined by 18%, or 
€1.9 billion, between 2006 and 2014 in real 
terms, according to the European Defence 
Agency (EDA) in Brussels, which will manage 
the research fund on behalf of the commission. 

The perception that international security is 
under threat is a driver for the broader defence 
fund. In November, the European Parliament 
passed a motion that says terrorists are tar- 
geting the continent on an unprecedented 
scale, and that Europe is “now compelled to 
react to an arc of increasingly complex crises”. 
The motion notes that for the first time since 
the Second World War, “borders in Europe 
have been changed by force” — referring to 
the annexation of Crimea in 2014 and the 
incursion into Ukraine by Russian forces. 

In September, Jean-Claude Juncker, 
president of the commission, made a simi- 
lar point when speaking about the European 
Defence Fund. “Europe can no longer afford 
to piggy-back on the military might of others,” 
he said. 

The rules for participating in the research 
fund are still being discussed, but it will be 
modelled loosely on Horizon 2020. It will 
probably promote projects that combine 


MILITARY METRICS 


The European Union's 28 member states 
combined spend much less on defence research 
than the United States, and probably China. 
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*Figures for the EU from 2014, figures for the United States 
and China from 2015; ‘Figures are approximate based on 
an estimated minimum for defence total and the average 
of a range for percentage spent on research. 


researchers from industry and academia and 
from different countries, says Denis Roger, a 
director in charge of research at the EDA. 

But whereas researchers on Horizon 2020 
projects are expected to publish their results, 
or to patent products for anyone to license, the 
commission is likely to restrict how the results 
of defence-fund research are publicized, classi- 
fying some and restricting licensing to national 
ministries. The EU has no army of its own — 
although Juncker has said he would like to 
create one. Instead, national ministries, along- 
side members of the defence industry, will be 
involved in setting priorities for the scheme, says 


Roger. And unlike Horizon 2020, which wel- 
comes participation from more than a dozen 
‘associated’ countries, the research defence fund 
is likely to be open only to EU member states 
and Norway. 

The project could boost certain fields. Roger 
says it will include research into metamaterials, 
which are made of tiny structures that manip- 
ulate the path of light and could potentially 
hide objects from radar, as well as methods of 
energy storage, flexible radio antennas that can 
be incorporated into clothing, and prototype 
maritime surveillance drones. 

“I would imagine that a lot of countries 
would definitely see this as another opportu- 
nity for funding,” says Ortwin Hess, a physicist 
at Imperial College London. He notes that US 
scientists working on photonics and meta- 
materials can readily access defence funding. 
“My US colleagues wouldn't survive without 
it. They live on it” 

Hess, who has received defence funding 
from the US and UK governments in the past, 
says he is a realist when it comes to what he 
calls the moral question. “I have to accept 
that our society has values that deserve to be 
defended,” he says. The military will adopt 
technologies developed in the civilian domain, 
and sometimes technology transfer can go the 
other way, he adds. 

But Parkinson says that defence research 
often supports military efforts beyond actual 
defence, as well as the export of weapons to 
other countries: “Our view is that we need a 
much stronger focus on R&D which contrib- 
utes to tackling the root causes of conflict — 
including a range of social and environmental 
problems.” = 
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US scientists plan for 
an uncertain future 


Concerned by the president-elect’s choice of advisers, 
researchers take steps to defend their fields. 


BY JEFF TOLLEFSON AND ALEXANDRA WITZE 
IN SAN FRANCISCO, CALIFORNIA 


government is beginning to take shape, and 
Earth scientists are getting nervous. 
Trump’ latest Cabinet appointments include 
former Texas governor Rick Perry, a climate 
sceptic, for energy secretary, and ExxonMobil 
chief executive Rex Tillerson for secretary of 
state — a position that would make him the 


[er US president Donald Trump's 


United States’ lead emissary on climate change. 
The pair helps to fill out a roster of advisers 
with strong ties to industry and a distaste for 
government regulation. Trump's transition 
team also asked the Department of Energy 
(DOE) for the names of employees who had 
worked on climate-change issues, further 
unsettling researchers. 

“It feels like a war on science, and on 
climate science in particular,’ says Alan 
Robock, a climatologist at Rutgers University 
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in New Brunswick, New Jersey. “That's very 
upsetting.” 

Scientists won a small battle on 14 December, 
when Trump's team disavowed the mem- 
orandum it sent to the DOE seeking 
information on climate-change programmes. 
The request sparked widespread outrage and 
drew a rebuke from the department after it 
was leaked on 9 December. At the Fall Meet- 
ing of the American Geophysical Union 
(AGU) last week in San Francisco, Cali- 
fornia, some researchers billed the episode 
as a blueprint for how they might defend 
their interests after Trump takes office on 
20 January. 

“There is power, even with an administration 
that never admits a mistake, in bringing 
things to light,” says Andrew Rosenberg, who 
heads the Center for Science and Democ- 
racy at the Union of Concerned Scientists in 
Cambridge, Massachusetts. 

Other researchers are copying government 
climate-data sets, to preserve them in case 
the Trump administration and the Republi- 
can-controlled Congress follow through on 


proposals to cut back Earth-science research 
at NASA or otherwise restrict studies of global 
warming. One rescue effort had archived 
11 of 91 data sets on its list for preservation as 
of 16 December; these include a global temper- 
ature record maintained by NASA and palaeo- 
climate archives held by the National Oceanic 
and Atmospheric Administration (NOAA). 

Marcia McNutt, president of the US National 
Academy of Sciences, says that private founda- 
tions have expressed interest in “funding up 
to the order of billions of dollars” for climate- 
change research if the Trump administration 
reduces support for such work. But McNutt — 
who directed the US Geological Survey (USGS) 
from 2009 to 2013 — is not ready to give up on 
government science. “I don't want that to be an 
excuse for the government to pull away — to 
say private philanthropy can do this, the govern- 
ment doesnt need to fund it? she told journalists 
at the AGU meeting. 

The road ahead for scientists looks tough. 


Perry dealt with energy issues as governor of 
Texas, but he lacks experience with key areas 
of the DOE portfolio, says John Deutch, a 
chemist at the Massachusetts Institute of Tech- 
nology in Cambridge. Deutch, who leads the 
department's advisory board, says that Trump 
should identify a deputy energy secretary who 
understands the agency’s programmes on basic 
science, nuclear weapons and national security. 

And Perry is not the only climate sceptic 
poised to join Trump’s inner circle. Trump’s 
pick to lead the US Environmental Protec- 
tion Agency is Oklahoma attorney- general 
Scott Pruitt, who has sued the federal 
government to overturn greenhouse-gas and 
air-quality rules. 

The president-elect has not announced 
whom he would like to run NASA, NOAA 
or the USGS, among other science agencies. 
McNutt says that the National Academies of 
Science, Engineering, and Medicine have pro- 
vided his transition team with a list of potential 


candidates, but none of those people has been 
contacted by Trump staff. 

Some scientists argue that even if policies to 
fight climate change are weakened or struck 
down under Trump, his latest nominations 
hint that there may be ways to promote clean 
energy. Tillerson has said that a carbon tax 
is the best way to address global warming. 
And although Perry is a strong proponent of 
fossil fuels, Texas’s wind-power production 
grew significantly during his governorship. 

“Those are places to insert a progressive 
agenda into an otherwise kind of ugly and 
cloudy landscape,” says Daniel Kammen, 
an energy researcher at the University of 
California, Berkeley. 

McNutt advises scientists to stay clear- 
eyed as they confront whatever challenges 
the Trump administration brings. “I see so 
many people in this country freaked out,’ she 
says. “That is exactly what those who want to 
disrupt science are hoping to achieve. = 


NAVIGATION SATELLITES 


Galileo satellites herald new 
era for Earth sciences 


Europe and Asia will set the atmosphere abuzz with more radio-wave navigation signals. 


BY DECLAN BUTLER 


fter soaring costs and years of delays, 
An global satellite-navigation 

system, Galileo, finally began beam- 
ing its first signals to receivers in smartphones 
and cars on 15 December. 

The 18-strong fleet of satellites promises 
travellers another way to accurately locate 
their position on Earth, ending Europe’s 
dependence on the US Global Positioning 
System (GPS) and Russias GLONASS. But 
Galileo, which was first proposed in 1999, is a 
big deal for science, too, says Richard Langley, 
an expert in navigation-satellite systems at the 
University of New Brunswick in Fredericton, 
Canada. What most excites scientists is the 
prospect of combining signals from multi- 
ple satellite networks, enabling new kinds of 
atmospheric and Earth-sciences research. 

Galileo’s constellation of satellites should 
reach its full complement of 30 in 2020, by 
which time China’s BeiDou system, compris- 
ing 35 satellites, is scheduled to enter service. 
Japan and India are also building regional 
systems. Altogether, the number of global nav- 
igation satellites encircling Earth is set to rise 
from around 90 today to at least 130 over the 


next decade, estimates Oliver Montenbruck, a 
physicist at the German Aerospace Center in 
Oberpfaffenhofen, Germany. At the same time, 
existing satellite fleets will be modernized. 

Earth's atmosphere will then be streaming 
with many more kinds of radio-wave signal at a 
greater variety of frequencies — each carrying 
information about the time and the position 
of the satellite that sent it. Sat-nav receivers use 
data from multiple satellites to pinpoint their 
own position. So simply having more satellites 
overhead will help 
stop signal loss and 
provide more accu- 
rate position fixes, 
says Langley. “The 
more satellites you 
have, the greater the precision,’ adds Tonie 
Van Dam, an Earth scientist at the University 
of Luxembourg who uses receivers to moni- 
tor how Earth’s crust deforms in response to 
shifting water or ice. 

Skies increasingly crowded with radio 
waves will also benefit weather forecast- 
ing and climate research. Scientists use the 
refraction of navigation-satellite signals in the 
Earth’s atmosphere to make measurements of 
atmospheric temperature, pressure, density 


“The more 
satellites you 
have, the greater 
the precision.” 


and water-vapour content. And the signals can 
similarly be exploited to measure electron den- 
sity in the ionosphere, an electrically charged 
layer in the upper atmosphere. These data 
are used to track space weather and to moni- 
tor tsunamis and earthquakes, says Philippe 
Lognonné, a geophysicist at the Institute of 
Earth Physics of Paris. These events disturb 
air so violently that they send acoustic and 
gravity waves up to the ionosphere where they 
perturb electrons. With fully operational Gal- 
ileo and BeiDou systems, researchers should 
be better able to estimate tsunami heights, 
Lognonné says. 

Scientists also plan to use multiple 
navigation-satellite constellations to improve 
measurements of ocean wind speeds and sea 
surface roughness, says Jens Wickert, a sci- 
entist at the GFZ German Research Centre 
for Geosciences in Potsdam. Today’s remote- 
observation ocean maps are built largely 
by bouncing radar waves off the sea from 
aircraft or spacecraft, and combining those 
data with information from other satellite 
instruments. The best current maps have a 
spatial resolution of around 80 kilometres 
and are updated every 10 days. Wickert aims 
to improve on that using orbiting receivers > 
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Over the past five years, 18 Galileo satellites have been launched into orbit. 


> for navigation-satellite signals. A Euro- 
pean experiment called GEROS-ISS, which 
Wickert is leading, aims to fly a receiver on 
the International Space Station in 2019. The 
experiment would measure navigation- 
satellite signals as they reflect off the sea. By 
combining data from Galileo, BeiDou, GPS 
and GLONASS, it could map the oceans at 


spatial scales down to a few kilometres every 
four days or less. Many ocean phenomena, 
such as eddies, occur at these scales, so bet- 
ter maps would help to improve weather and 
climate-change models. 

A fleet of receivers in space could provide 
even finer resolution. In a step in that direc- 
tion, on 15 December NASA launched its 


own ocean-reflection research mission, the 
Cyclone Global Navigation Satellite System. 
A fleet of eight microsatellites, each carry- 
ing four navigation-satellite receivers, will 
measure wind speeds and ocean roughness 
in the eyes of storms at unprecedented reso- 
lutions of a few kilometres every few hours. 
Chris Ruf, Cyclone’s principal investiga- 
tor and a remote-sensing scientist at the 
University of Michigan in Ann Arbor, 
says that the first mission will use GPS 
only, but he is keen to integrate data from 
Galileo and BeiDou in follow-ups. 

Much research on fusing signals from nav- 
igation-satellite systems is taking place under 
a federation of more than 200 agencies, uni- 
versities and research centres. Montenbruck, 
who heads this effort, cautions that it may take 
more than five years after Galileo and BeiDou 
enter full service before scientists can exploit 
their possibilities completely. “Today's use of 
GPS benefits from 30 years of experience and 
an excellent understanding and characteriza- 
tion of all the dirty details,” he says. “All that 
still needs to be carried out for Galileo and 
BeiDou.” m 


STEPHANE CORVAJA - ESA 


Major rethink for 
outbreak response 


World Health Organization aims to prevent crises similar 
to the West African Ebola epidemic. 


BY ERIKA CHECK HAYDEN 


r | Ahree years after the start of the world’s 
worst Ebola epidemic, the World 
Health Organization (WHO) has 

created a programme to improve its response 

to disease outbreaks and to prevent another 
such calamity. 
In June, WHO director-general Margaret 

Chan named medical epidemiologist Peter 

Salama to lead a new health-emergencies 


programme intended to streamline the agency's 
response to crises. As part of that programme, 
the WHO has launched the Emerging Diseases 
Clinical Assessment and Response Network 
(EDCARN) to provide guidance on how to care 
for people during disease outbreaks. 
Global-health experts say that the changes 
area step in the right direction, but both devel- 
oping and wealthy nations must do much more 
to avert another devastating epidemic. Some 
are also concerned that the WHO programme 


will have trouble getting the funding it needs to 
succeed, because of a lack of monetary support 
from member nations. 

“African countries are still so dependent 
on international and global outfits that the 
return of Ebola or any other disease will be 
another déja vu of national unpreparedness,” 
says virologist Oyewale Tomori at Redeemer’s 
University in Ede, Nigeria. 

Tomori says that many developing nations 
still don't have sufficient capacity for recogniz- 
ing and responding to an emerging infectious 
disease. 


BUILDING A BRIDGE 

The WHO’s new programme aims to 
strengthen local health systems and to bridge 
global, organizational and governmental 
efforts to prevent the next outbreak. Daniel 
Bausch, EDCARN’s technical lead, says that 
the network aims to fill huge gaps exposed dur- 
ing the Ebola crisis: a lack of knowledge about 
how best to care for people who have such seri- 
ous diseases, and a shortage of physicians and 
experts who are prepared to provide that care. 
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CERN 
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“Clinical management of patients during 
infectious-disease outbreaks has been one of 
the neglected areas of public health,” Bausch 
says. “We've realized that it’s one of the many 
areas where international support is extremely 
important.” 

So EDCARN is bringing together specialists 
in diseases that are likely to spark outbreaks 
— including Ebola, Middle East respiratory 
syndrome (MERS) and Crimean-Congo 
haemorrhagic fever. These people could be 
deployed to affected areas along with nurses, 
logistics experts and infection-control 
specialists to advise non-governmental 
agencies, governments and others on the 
ground who would be caring for patients. 

The hope is that this will prevent some of the 
problems with patient care and health-worker 
protections seen in the Ebola epidemic, in 
which many local and foreign health workers 
were among the 11,310 who died. 


FUNDING WORRIES 

Public-health policy analysts, such as 
Lawrence Gostin at Georgetown Univer- 
sity in Washington DC, are optimistic about 
EDCARN. “I think it is a helpful initiative, and 
is in line with some of the ideas put forward 
by the Ebola commissions,’ says Gostin, who 
served on two of the five international panels 
that have recommended major reforms to the 
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A health worker disinfects a corpse in Monrovia, Liberia, during the recent Ebola outbreak. 


global-health system in the wake of the Ebola 
crisis. 

Other health initiatives that were started in 
the aftermath of the Ebola epidemic have run 
into problems, which have tempered expecta- 
tions for EDCARN. A US-led effort to boost 
domestic health systems has met with local 
resistance, and a WHO-led programme with 


similar goals does not have enough long-term 
funding to keep going, Gostin says. 

To succeed, EDCARN must raise 
US$485 million for the 2016-17 fiscal year, and 
is currently only 56% funded. Gostin hopes 
that stinginess among the WHO member 
countries won't doom the fledgling attempts 
to head off another crisis. m 
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An emperor penguin in 
Antarctica’s Ross Sea, 

which will now host the 
world’s largest marine 
reserve. 


FROM AN ELECTION THAT STUNNED THE 
WORLD TO CATASTROPHIC TECHNICAL 
GLITCHES IN SPACE, RESEARCHERS 
WEATHERED A TURBULENT YEAR. 

BUT THEY ALSO ANNOUNCED SOME 
REMARKABLE ADVANCES — THE DIRECT 
DETECTION OF GRAVITATIONAL WAVES, 


THE BIRTH OF A BABY WITH DNA FROM 
THREE PEOPLE AND AN ARTIFICIAL 
INTELLIGENCE THAT CRACKED THE ONE 
BOARD GAME THAT COMPUTERS HAD 
YET TO MASTER. 
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CATCHING A WAVE 


Physicists bagged some big game this year. On 
11 February, researchers announced that they had 
finally sensed the ripples in the structure of space-time 
known as gravitational waves — capping a decades-long 
quest. The signal, spotted in September 2015 by the twin 
detectors of the Laser Interferometer Gravitational-Wave 
Observatory (LIGO) in Louisiana and Washington state, 
came from the merger of two black holes some 1 billion 
years before. 

The announcement was a stunning affirmation of 
Albert Einstein’s general theory of relativity, almost 
100 years after he had published it. And it provided the 
most direct evidence yet that black holes — another pre- 
diction of Einstein's theory — exist. Astrophysicists hailed 
LIGO’s feat as a triumph, saying that it heralded a new 
way of observing the cosmos, enabling the detection of 
phenomena that might not be picked up by other means. 

Just weeks after the LIGO announcement, another 
experiment demonstrated that the search for gravitational waves could 
one day occur in space. The European Space Agency’s LISA Pathfinder 
mission tested technologies for a future trio of probes that would sense 
gravitational waves coming from even larger and more-distant objects 
than the ones LIGO observed. 

Particle physicists were not so lucky. They spent much of the year 
holding their collective breath. Two separate experiments at the Large 
Hadron Collider near Geneva, Switzerland, had reported anomalous 
measurements in late 2015 that suggested the existence ofa particle 
six times as massive as the Higgs boson. 

At the time, experimenters warned that the anomalies could be 
statistical flukes. And more data released in August confirmed this. 
By then, theoreticians had written hundreds of papers in attempts to 
interpret the original data with a zoo of possible models. 


NEW WORLD ORDER 


A tumultuous US presidential campaign ended in a surprise victory 
for Republican businessman Donald Trump in November. Research- 
ers struggled to understand how a Trump administration would treat 
science — in part, because it did not feature prominently on the cam- 
paign trail. Still, some of Trump's views were clear: he has alleged that 
climate change is a hoax perpetrated by the Chinese, and has pledged 
to withdraw the United States from the Paris climate agreement. He 
has also suggested a link between autism and childhood vaccinations. 

As Trump’s administration began to take shape, researchers started 
to lobby against what they saw as an incoming president with little 


A sunlit part of Jupiter. 
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President-elect Donald Trump on the campaign trail in October. 


use for science. In late November, more than 2,300 scientists — 
including 22 Nobel laureates — sent a letter to Trump urging him to 
“adhere to high standards of scientific integrity and independence 
in responding to current and emerging public health and environ- 
mental health threats”. 

The United Kingdom's 23 June vote to leave the European Union 
shook the country’s scientific community. Researchers remain worried 
about the fate of millions of euros in annual research funding from 
the EU and the immigration status of UK university staff from non- 
British EU nations. Some UK researchers have reported being cut out 
of EU collaborations, and some foreign scientists say that they feel so 
unwelcome that they plan to leave the country. In happier news, the 
UK government — led by a cabinet that came to power in the wake of 
the Brexit vote — announced a surprise funding boost in November 
worth £2 billion (US$2.5 billion) annually by 2020. 

A failed military coup in Turkey in July spelt upheaval of a differ- 
ent kind for academics: the Council of Higher Education promptly 
fired more than 1,500 university deans. About 58% of the positions 
have since been refilled by their former occupants. But more than 
6,500 professors have been dismissed on suspicion of involvement 
in the coup. Human-rights groups say that many of the dismissed 
are innocent. 

Political and economic woes rocked Venezuela, Brazil and South 
Africa this year — and did not spare researchers. Rolling blackouts, 
food queues and increasing violence prompted hundreds of scientists 
to leave Venezuela's universities, and, in some cases, the country. 
Brazil’s researchers are facing drastic budget cuts and 
the demotion of the science ministry, and protested 
against proposals to freeze federal science spending and 
weaken the country’s environmental laws. And austerity 
measures in South Africa have led to chronic under- 
funding of universities and triggered a rash of campus 
protests and violence. 


TO BOLDLY GO 

In the year that saw the 50th anniversary of Star Trek, 
technical glitches set back several space missions — but 
there were also notable victories. In March, the Japan 
Aerospace Exploration Agency’s flagship Hitomi X-ray 
astronomy satellite failed just weeks after launch. Inves- 
tigators determined that a software error had caused 
the spacecraft to rotate out of control and break apart. 
In July, NASAs Juno probe arrived at Jupiter, but prob- 
lems with its main engine delayed the rocket firing > 
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BIOLOGY PREPRINTS ON THE RISE 


Life scientists have not embraced preprints — studies published online before peer review — as readily as have 
physicists or mathematicians. But that is changing. Even the US National Institutes of Health is getting behind the 
practice, mandating that grant recipients in its 4D Nucleome consortium publish preprints of their work. 
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arXiv Preprints begin to 
take off in 2013 with 


From 2008 to 2014, the start of bioRxiv. 
arXiv garnered half a 
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Life-sciences preprints per month 


million manuscripts. 
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> that would have shrunk its orbit into a tighter ellipse around the who first invented the technique. Each team applied for patents that 

planet. The spacecraft continues to gather data on Jupiter's atmosphere could be crucial for commercial applications. 

and magnetosphere on every fly-by — just more slowly than planned. Meanwhile, research using CRISPR-Cas9 in human embryos 
Meanwhile, the European Space Agency (ESA) mourned the end expanded this year. It is a controversial area of research 

of two probes. In October, the Schiaparelli lander — part of the ESAs that has raised concerns about the potential for designer 


ExoMars mission — smashed onto the red planet’s surface after a meas- 
urement error caused its parachute and braking rockets to deploy at the 
wrong times. But at least its companion spacecraft managed to enter 
orbit around Mars. ESA’s other loss was sad for some scientists, but 
deliberate. The pioneering Rosetta spacecraft crashed onto the surface 
of Comet 67P/Churyumov-Gerasimenko in September as planned, 
radioing back close-up images before it lost contact — and bringing 
the mission to an end. 

Rising space power China garnered several wins. In August, it 
launched the first ever quantum satellite, aimed at testing ways to extend 
secure quantum communication into space. In September, the country 
completed construction on the world’s largest single-dish telescope, the 
Five-hundred-meter Aperture Spherical Radio Telescope in the south- 
western province of Guizhou. And in November, China launched the 
Long March 5 rocket, one of the world’s most powerful. It is meant to 
send people, rovers and heavy-duty planetary probes into space. 

Finally, two Chinese astronauts broke their country’s record for the 
longest-duration space mission when they spent a month aboard the 
Tiangong 2 space laboratory in October and November. 


babies — but regulators in some countries have approved 
projects in this field. Teams in China, the United King- 
dom and Sweden announced their intentions to use 
the technique to optimize its use in embryos and to 
study human development. Work in the United States 
is expected to follow, despite a prohibition on the use 
of federal funds to study human embryos or to modify 
human eggs or sperm. 


CLIMATE CRUNCH 

Representatives of a record 174 countries and the 
European Union gathered on Earth Day, 22 April, to 
sign the international climate agreement forged in 
Paris in December 2015. But for the accord to come 
into force, more than 55 countries accounting for at 
least 55% of global greenhouse-gas emissions needed 
to submit ratification or acceptance documents. 

The biggest boost came in September, when the 
United States and China — which together account 
for 38% of global emissions — formally joined the 
agreement. Brazil and 30 other countries joined a few 

weeks later, and the EU sealed the deal on 5 October. 

The pact came into effect on 4 November. 

But that wasn't the only global climate deal 
afoot. On 6 October, the United Nations’ Inter- 
national Civil Aviation Organization curbed 
emissions from international flights. And on 

15 October, 197 countries agreed to amend the 

Montreal Protocol — designed to protect the 
ozone layer — to phase out hydrofluorocar- 

bons, powerful greenhouse gases com- 

monly used in air conditioners. Countries 
also broke a four-year-long impasse on 


CRISPR IN COURT 
The development of new applications for the genome-editing tool 
CRISPR-Cas9 continued apace. On 28 October, a patient with lung 
cancer at West China Hospital in Chengdu became the first person to 
be treated with cells edited using CRISPR-Cas9. As part ofa clinical 
trial, researchers disabled a gene that normally holds a cell’s immune 
system in check, in the hope that the edited cells would mount an 
immune response against the cancer. More cancer trials using treat- 
ments based on CRISPR-Cas9 are expected in the United States 
and China next year. 

But the commercial landscape for CRISPR-Cas9 therapies 
remains uncertain. The battle over US patent rights to the 
gene-editing technique reached fever pitch after the US Patent 
and Trademark Office declared an ‘interference’ proceeding 
between two research teams in January. The proceeding, 
which could conclude early next year, aims to determine 


Green lights illuminate the Eiffel Tower to 
celebrate the Paris climate agreement. 
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28 October to create the world’s largest marine reserve in 
the Ross Sea off the coast of Antarctica. 

All the while, global warming continued. An epic 
El Nifio in the tropical Pacific Ocean helped set global- 
temperature records in the first five months of the year. 
This put 2016 on track to become the third straight 
warmest year in a row. The blazing warmth prompted 
corals around the world to bleach, a process in which the 
stressed animals expel the algae that help to keep them 
alive. The El Nifio faded away in May, but coral bleach- 
ing continued throughout the year and is expected to 
continue into 2017. 


ZIKA SPREADS 

In February, the World Health Organization (WHO) 
declared that clusters of birth defects linked to outbreaks 
of Zika virus in Brazil constituted a global public-health 
emergency. These birth defects included severe cases of 
microcephaly, a condition in which fetuses or newborns 
have abnormally small heads and brains. 

But the expected explosion in microcephaly cases and 
other Zika-linked birth defects across the Americas has not material- 
ized, despite the virus’s spread across the continents. 

Even in Brazil, extremely high rates of microcephaly remained 
confined to the country’s northeast region — and researchers began 
to suspect the influence of confounding factors. In July, Brazilian 
authorities launched a study to find out whether environmental, 
socio-economic or biological elements, when combined with Zika 
infection, could explain the odd distribution of elevated rates. They 
expect preliminary results early next year. 

On 18 November, the WHO declared the end of the international 
public-health emergency on the grounds that the link between Zika 
and birth defects had been established, and that the focus needed to 
shift to understanding the consequences of Zika infections, including 
the birth defects, and developing a vaccine. 

Several ongoing international research projects should produce 
results next year on a number of Zika-related questions, such as what 
proportion of infected pregnant women go on to have babies with 
birth defects. 


Stem 
A four-month-old baby born in Brazil with microcephaly. 
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Go player Lee Sedol reviews a match against artificial intelligence. 


MIND GAMES 


In January, a computer program beat a world-class human player at the 
ancient game of Go for the first time. But the ultimate showdown was in 
March, when the artificial intelligence (AI), called AlphaGo, trounced 
Lee Sedol — one of the world’s top players. The AI, developed by the 
Google-owned company DeepMind in London, opened with three con- 
secutive wins in the five-round tournament. Lee took the fourth game 
and nearly won game five, but AlphaGo triumphed. 

In October, DeepMind researchers debuted another AI, one capable 
of navigating the London Underground without any previous knowl- 
edge. The sophisticated program combined memory with the ability 
to learn from experience. This brought AI a step closer to performing 
human-like tasks such as reasoning. 

Alalso helped to reduce errors in machine translation of languages 
by around 60%, and aided physicists looking for new super materi- 
als. These advances were largely powered by deep learning, which 

harnesses huge data sets and a hierarchical, brain-like 
method of computing. 


CONTROVERSIAL 
CONCEPTION 


After decades of research, assisted-reproduction tech- 
niques that mix DNA from three people are bearing 
fruit. These procedures prevent children from inherit- 
ing metabolic diseases caused by flaws in mitochondria, 
the cell’s energy-producing structures. 

In September, researchers working in a Mexican clinic 
reported the birth of the first healthy baby conceived 
through one such procedure. A baby in China was also 
reportedly born using the same technique. And in Octo- 
ber, a clinic in Ukraine announced that two previously 
infertile women had conceived through a similar proce- 
dure. On 15 December, following scientists’ advice, the 
United Kingdom’s Human Fertilisation and Embryology 
Authority said that the technique was ready for clinical 
use, which could start as soon as 2017. 


Written by Alison Abbott, Declan Butler, 

Davide Castelvecchi, Daniel Cressey, Elizabeth 
Gibney, Heidi Ledford, Jane J. Lee, Lauren Morello, 
Sara Reardon, Jeff Tollefson and Alexandra Witze. 
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STRIKING CRANES 


Hundreds of thousands of sandhill 
cranes (Grus canadensis) converge 

on Platte River in Nebraska as part of 
their annual migration. Photographer 
Randy Olson was taking long-exposure 
shots in March when lightning struck, 
creating these ghostly outlines. 


366 DAYS: 


the year in sc 


~ OFTHE YEAR 


In a year of political turmoil and shock, 
science, too, came up with surprises. 

To document some of these wonders, 
photographers roamed the world, revealing 
objects from the microscopic to the cosmic 
in scale. 


Images selected by Nature’s art and design team 
Text by Daniel Cressey 
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The largest and most accurate radiosurvey of the southern sky was 

unveiled in October by the high-resolution Galactic and Extragalactic All-sky 
Murchison Widefield Array (GLEAM) project. The Milky Way flows. through * 
this image, which encompasses more than 300,000 galaxies. .. 
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' CHINA CHANGES 


Whitson, Oleg Novitskiy 
and Thomas Pesquet to the 
International Space Station. 


China this year revealed ambitious plans to cut coal use and pollution and to embrace renewable energy. But this 
steel plant in Inner Mongolia is just one example of the many industries that stand in the way of that reform. 


SPACE STORMS 


Far below the International Space Station, 
lightning flashes illuminate the clouds, as 
human activity is revealed by clusters of 
lights. Two Russian spacecraft visiting the 
station can be seen in the foreground. 
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CRYSTAL STEPS 


These strange 
structures are calcium 
carbonate crystals, 
imaged at 2,000x 
magnification. 


SEE-THROUGH 
AND SMALL 


In August, a team in 


Germany unveiled ‘ultimate 


DISCO’ —a technique 
that both renders tissues 
transparent and shrinks 


specimens, so that a whole 
animal can be imaged in one 
go. The technique can reveal 


the nervous system and 


organ systems within a body 


in unprecedented detail. 


SACRED SYMBOLS 


In April, remarkable images of ancient 
Egyptian tattoos found on a mummy 

were shown at a meeting of the American 
Association of Physical Anthropologists. The 
tattoos include two seated baboons and a 
symbol of protection on the mummy's neck. 


STRIKING CELL 


This human stem cell is just 15 micrometres across, and was 
false-coloured after being imaged using cryogenic scanning 
electron microscopy. 
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A physicist helped to catch the first direct 
signs of long-sought gravitational waves. 


BY DAVIDE CASTELVECCHI 


secret of her life. Two giant detectors in the United States had picked 

up signs of gravitational waves — wrinkles in space-time imagined 
by Albert Einstein but never before directly witnessed. It was Gonzalez’s 
job to help lead more than 1,000 scientists in their careful efforts to 
verify the discovery before announcing it to the public. 

News like that doesn’t stay under wraps for long, but the discovery 
was so momentous that the research team took nearly five months to 
analyse data from the two Laser Interferometer Gravitational-Wave 
Observatory (LIGO) detectors in Washington state and Louisiana. As 
spokesperson for the LIGO Scientific Collaboration, Gonzalez was one 
of the key people coordinating the analysis by groups scattered around 
the world, including researchers at the Virgo interferometer near Pisa, 
Italy, which pools its data with LIGO. 

The role of shepherding this massive effort made use of Gonzalez’s 
multidimensional talents. Most physicists know early on whether 
they will be a theorist or an experimentalist. But Gonzalez started her 
graduate studies as a theoretical physicist and only later switched to 
experimental work, when she showed uncommon aptitude. “It was 
the thing that set her up asa first-class scientist, says Rainer Weiss, a 
physicist at the Massachusetts Institute of Technology in Cambridge 
and one of the founders of LIGO. 

Throughout her career, Gonzalez has done “a bit of everything” at 
LIGO, she says. For a while, she took on the crucial task of diagnosing 


A= ago, Gabriela Gonzalez was struggling to contain the biggest 
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the performance of the interferometers to make sure that they achieved 
unparalleled sensitivity — which is now enough to detect length changes 
in the 4-kilometre-long arms of the interferometers to within one part 
in 10”', roughly equivalent to the width of DNA compared with the 
orbit of Saturn. She has helped to lead the teams that analyse the data. 
And she nudged gravitational-wave researchers and dozens of their col- 
leagues in conventional astronomy into signing pacts of cooperation. 
Together, they will look for phenomena that emit both gravitational 
and electromagnetic waves, in what has been called the coming age of 
multimessenger astronomy. 

In the hectic months before announcing the LIGO discovery, 
Gonzalez and her colleagues struggled to make sure that they had 
iron-clad evidence. They knew that history had not been kind to those 
who had previously reported gravitational waves. Most recently, in 
early 2015, an international collaboration had to retract its claims 
that a telescope at the South Pole had discovered indirect signs of the 
long-sought vibrations. 

To add to the pressure on the LIGO team, rumours of a discovery 
began to leak within a week of the initial finding, and reporters started 
to call. Throughout the long analysis period, Gonzalez says, she never 
made an important decision without consulting colleagues. But others 
laud her leadership. “What Gaby did is, she managed to get us through 
this period,’ Weiss says. 

Gonzalez is based at Louisiana State University in Baton Rouge, close 
to the LIGO interferometer in Livingston. In 2008, she became the first 
woman to receive a full professorship in her department. She says that 
she has never experienced outright sexual harassment or discrimina- 
tion during her career, but “I had to prove myself perhaps more than 
other people”. 

Gonzalez has said that after her current term as LIGO spokesperson 
ends in March 2017, she will not run again. She plans to go back to full- 
time research. The field of science she helped to create — gravitational- 
wave astronomy — has just seen its dawn. “It has always been a fun ride. 
And now it’s even better.” = 
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MIND CRAFTER 


An AI developer beat one of the best 
at Go. Next up, solve global problems. 


BY ELIZABETH GIBNEY 


toughest match of his life — and he wasnt even playing. 

Hassabis had to watch from the sidelines as his team’s 
creation, the computer program AlphaGo, took on Lee 
Sedol, a top-ranked champion in the strategy game Go. 
The computer won, marking a huge victory for the field 
of artificial intelligence (AI) and another in a series of 
triumphs for Hassabis. 

As co-founder of DeepMind, the London-based firm that 
developed AlphaGo, Hassabis was both elated and relieved. 
“Tt felt like our moonshot, and it was successful,” he says. 

But the win was about much more than Go. Hassabis 
wanted to show the world the power of machine-learning 
techniques, which he hopes to someday harness in a human- 
like, general AI capable of solving complex global problems. 

Hassabis had sketched this vision out as a precocious 
youth. A chess prodigy, he began designing innovative, 
multimillion-selling video games while in his teens and 


F veteran gamer Demis Hassabis, March brought the 


HASSABIS 


FEATURE 


started his own company in his early 20s. After completing 
a PhD in cognitive neuroscience, he founded DeepMind in 
2010. Google bought the company 4 years later for a reported 
£400 million (more than US$650 million at the time). 

At the firm, researchers apply inspiration from neuro- 
science to eye-catching AI tasks, from synthesizing speech to 
navigating the London Underground. Each algorithm builds 
complexity on to the last, says Hassabis, and weaves in capa- 
bilities that have historically been developed separately in AI. 
DeepMind Als have gone from learning how to see, and act- 
ing on that vision, to using it to plan and reason. In terms of 
real-world problem-solving, the team used machine learning 
to cut power usage in Google's data centres by 15%, some- 
thing that Hassabis hopes to apply ona much grander scale. 

Although the company’s researchers do publish, their 
work-in-progress is kept under wraps, which irks some aca- 
demics. And some data-privacy advocates have concerns 
over Google DeepMind’s plans to collaborate with the UK 
National Health Service. Scientists, however, have been 
flocking to work at the company. 

In person, Hassabis is unassuming but eager. He has 
a knack for swaying others to his passion, says Eleanor 
Maguire, his former PhD supervisor at University College 
London. “Once he gets talking about something he’s inter- 
ested in, it’s infectious,” she says. Fitting research alongside 
running the company now means saving science for the 
small hours of the morning, something Hassabis says he 
doesn't mind. “It’s a very important mission that we're on, 
and I think it's worth the sacrifice” m 
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OOLING AGENT 


An atmospheric chemist laid the foundation 
for an international climate agreement. 


BY JEFF TOLLEFSON 


Guus Velders had his chance in October. He was attending inter- 

national negotiations in Kigali, Rwanda, that were seeking to phase out 
production and use of hydrofluorocarbons (HFCs), extremely potent 
greenhouse gases commonly used in air conditioners. 

Most nations had agreed on an aggressive timetable to begin eliminat- 
ing the compounds, but India and a handful of other countries wanted 
an extra four years. After plugging the numbers into a model on his 
laptop computer, Velders informed negotiators that this particular 
concession would have little impact on the planet. 

That and his earlier work helped to smooth the way for a widely 
hailed global accord, which was signed on 15 October. Velders, a soft- 
spoken researcher at the National Institute for Public Health and the 


| t isn’t often that atmospheric chemists get to help save the world, but 
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REEF SENTINEL 


A coral researcher sounded the alarm over 
massive bleaching at the Great Barrier Reef. 


BY DANIEL CRESSEY 


his heart sank at the sight of telltale pale patches just below the 
surface, where corals were dead or dying. 

Hughes, director of the Australian Research Council’s (ARC’s) Centre 
of Excellence for Coral Reef Studies in Townsville, says that he and his 
students wept after looking at the aerial surveys of the damage. The 
bleaching hit nearly all of the reef, with initial surveys showing 81% 
of the northern section suffering severely. It was the most devastating 
bleaching ever documented on the Great Barrier Reef — and part of a 
wider event that was harming corals across the Pacific. 

The trigger for this year’s coral troubles in the Pacific was a strong 
El Nifio warming pattern in the tropical part of that ocean. Abnor- 
mally high water temperatures prompt corals to expel the symbiotic 
zooxanthellae algae that provide them with much of their food — 
and their colour. Some corals can recover after bleaching, but others 
die. Follow-up studies in October and November found that 67% 
of shallow-water corals in the 700-kilometre northern section of the 
Great Barrier Reef had died. 

When the massive El Nifio reared up in the Pacific in 2015, Australian 
researchers feared that the country’s reefs could be in danger. So Hughes, 
one of the world’s leading coral researchers, assembled a task force ready 


We: Terry Hughes flew over the Great Barrier Reef in March, 


Environment in Bilthoven, the Netherlands, is proud of the part he 
played. “I’ve never been involved in a process that leads to a global 
agreement on climate before,” he says. 

It was no coincidence, however. Colleagues say that Velders has 
become the world’s expert on HFC emissions, and that nobody 
else could have provided such rapid analysis in Kigali. He is part 
of a community of scientists that has helped to refashion the 1987 
Montreal Protocol — an international agreement designed to protect 
the stratospheric ozone layer — into a tool with which to fight global 
warming. 

The refrigerants that fall within the scope of the protocol are also 
powerful greenhouse gases, and Velders’ team showed that the Montreal 
agreement actually did more to control global temperatures than did the 
1997 Kyoto Protocol climate treaty. More recently, the team projected 
how much warming HFCs were likely to cause over the twenty-first 
century. That helped to set the stage for the agreement on HFCs, which 
was reached as an amendment to the Montreal Protocol. 

“The Velders team always answered the right questions at the right 
time,’ says Durwood Zaelke, president of the Institute for Governance 
& Sustainable Development, an advocacy group in Washington DC. “It's 
safe to say that we wouldn't have this agreement without them.” 

Now it’s back to the drawing board for Velders’ team. Their scenario 
about how HFC emissions would grow over time was rendered obsolete 
by the new agreement to ban them. That's the kind of intellectual setback 
that Velders heartily accepts. m 
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to survey the reef if bleaching occurred. The group eventually 
expanded to 300 scientists. “We put together a very detailed research 
plan, hoping of course that it wouldn't happen,” he says. 

Hughes is based close to the central portion of the Great Barrier 
Reef. After leading the initial surveys, he became the de facto 
spokesperson on the catastrophe. At the height of media interest in 
the bleaching, Hughes did 35 interviews in one day. 

“In Australia, even people who have never been to the Great 
Barrier Reef and might never go there regard it as an icon,’ says 
Bob Pressey, a fellow researcher at the ARC centre. 

The crisis on the reef defied some rules. Conventional thinking 
on bleaching events, says Hughes, is that corals die slowly from 
starvation after their zooxanthellae leave. But this year, water 
temperatures were so high that “we saw a lot of corals die before the 
starvation kicked in. They actually cooked.” 

Corals throughout the world have struggled in the past couple of 
years, as global temperatures have repeatedly hit record highs. In 
October 2015, the US National Oceanic and Atmospheric Admin- 
istration declared that a global bleaching event was happening as 
coral reefs in Hawaii, Papua New Guinea and the Maldives began 
to succumb. 

This year, the bleaching spread to Australia, Japan and other parts 
of the Pacific. Researchers say that, as climate change drives up base- 
line temperatures, bleaching will afflict reefs more frequently. Under 
some scenarios, this could happen so often that most corals can no 
longer survive. 

Hughes is not ready to give up on the Great Barrier Reef just yet. 
But the recent bleaching has left corals in a weakened state, prone 
to attacks from pathogens and predators. Another bleaching event 
in the near future could bring further damage. “The message to 
people,’ he says, “should be we've got a closing window of opportu- 
nity to deal with climate change.” m 
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ZIKA DETECTIVE 


A physician raced to make sense of a 
medical mystery in northeast Brazil. 


BY DECLAN BUTLER 


the epicentre of concern was Brazil, where the epidemic first 

appeared in the Americas. Some researchers even called for post- 
poning the Olympic Games scheduled for Rio de Janeiro in August 
that year. But away from the media frenzy, Celina Maria Turchi 
Martelli battled on the front lines in northeast Brazil to make sense 
of the medical mystery there. 

Turchi, a physician and infectious-disease expert, has had her life 
turned upside down by Zika since September 2015. That's when the 
ministry of health asked her to investigate a sharp rise in reports 
of babies born with abnormally small heads and brains, a condi- 
tion known as microcephaly, in her home state of Pernambuco. She 
quickly became convinced that the country was facing a public- 
health emergency. “Not even in my worst nightmare as an epidemi- 
ologist had I imagined a microcephaly neonate epidemic,’ she says. 

Turchi, who is based at the Aggeu Magalhaes Research Center in 
Recife, immediately contacted scientists across the globe for help. 
She formed a networked task force of epidemiologists, infectious- 
diseases experts, paediatricians, neurologists and reproductive 
biologists. The challenges were formidable, says Turchi: there were 
no reliable lab tests for Zika, and there was no consensus on a case 
definition of microcephaly. But the intense networking paid off, and 
Turchi and her colleagues eventually generated enough evidence to 
demonstrate a link between the condition and infection with Zika 
in the first trimester of pregnancy. 

Still, the mysteries are far from solved, says Turchi. Although Zika 
has spread across the Americas, the expected explosion in the num- 
ber of microcephaly cases outside northeast Brazil has not materi- 
alized. Turchi and her task force are now trying to work out why. 
When she started going into the hospitals of Recife to investigate the 
outbreak, Turchi says, she had to innovate. “There was no book to 
follow.” Now, she and her colleagues are writing that book. m 


Fi: about the Zika virus spread across the globe in 2016, and 
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PAPER PIRATE 


The founder of an illegal hub for paywalled 
papers has attracted litigation and acclaim. 


BY RICHARD VAN NOORDEN 


technology student to famous fugitive. 

In 2009, when she was a graduate student working on her final-year 
research project in Almaty, Kazakhstan, Elbakyan became frustrated at 
being unable to read many scholarly papers because she couldn't afford 
them. So she learnt how to circumvent publishers’ paywalls. 

Her skills were soon in demand. Elbakyan saw scientists on web forums 
asking for papers they couldn't access — and she was happy to oblige. “I 
got thanked many times for sending paywalled papers,’ she says. In 2011, 
she decided to automate the process and founded Sci-Hub, a pirate website 
that grabs copies of research papers from behind paywalls and serves them 
up to anyone who asks. This year, interest in Sci- Hub exploded as main- 
stream media cottoned on to it and usage soared. According to Elbakyan’s 
figures, the site now hosts around 60 million papers and is likely to serve 
up more than 75 million downloads in 2016 — up from 42 million last 
year and, by one estimate, encompassing around 3% of all downloads 
from science publishers worldwide. 

Itis copyright-breaking on a grand scale — and has brought Elbakyan 
praise, criticism and a lawsuit. Few people support the fact that she acted 
illegally, but many see Sci- Hub as advancing the cause of the open-access 
movement, which holds that papers should be made (legally) free to read 
and reuse. “What she did is nothing short ofawesome,’ says Michael Eisen, 


| t took Alexandra Elbakyan just a few years to go from information- 


APNEET JOLLY/FLICKR/CC BY 2.0 


FERTILITY REBEL 


A physician jump -started debate over a 
controversial IVF procedure. 


BY SARA REARDON 


the reactions that fertility specialist John Zhang triggered in the 

scientific community in September, when he announced that a con- 
troversial technique that mixes DNA from three people had been used 
to produce a healthy baby boy. 

This kind of technique is intended to prevent children from inheriting 
disorders involving mitochondria — the cellular structures that produce 
energy. But ethical and safety concerns have prompted the United States 
to ban such procedures without a permit. Zhang, who works at New 
Hope Fertility Center in New York City, performed the technique at the 
company’s clinic in Mexico. 

Critics saw this as an attempt to evade regulation, and complained that 
he had announced the work at a conference rather than in a publication. 


Gi anger, scepticism and congratulations. Those were some of 
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But Zhang brushes aside those objections. “The most important is to 
have a live-birth baby, not to tell the whole world,’ he says. 

Zhang has a habit of pushing scientific and ethical boundaries. In 
the 1990s, he worked with reproductive endocrinologist Jamie Grifo 
at the New York University Langone Medical Center to develop a 
version of the technique that Zhang used this year. The approach was 
designed to help older women to become pregnant by replacing their 
ageing mitochondria with those from younger eggs. No successful 
pregnancies resulted. 

When US regulators began restricting this technique in 2001, Zhang 
and his collaborators in China took over the work. In 2003, Zhang's team 
created and implanted multiple embryos into a woman. After all the 
fetuses were miscarried, China banned the technique as well. 

Grifo and some others applaud Zhang’s latest work. “I think it’s a great 
thing it was finally done,” says Grifo. But others have criticized the New 
Hope team. “A lot of things they did were completely unsafe,’ such as 
infusing the donor's egg with a drug that could cause chromosomal 
abnormalities, says Shoukhrat Mitalipov, a stem-cell scientist at Oregon 
Health & Science University in Portland. 

Zhang is undeterred. He says that plenty of other families at risk 
of mitochondrial disease have expressed interest in his procedure, 
and he hopes to perform it in other countries. “Five to ten years from 
today, people will look at it and say, “Why were we all so stupid, why 
were we against it?” he says. “I think you have to show the benefit to 
mankind” = 
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a biologist and open-access supporter at the University of California, 
Berkeley. “Lack of access to the scientific literature is a massive injustice, 
and she fixed it with one fell swoop?” 

For the first few years of its existence, the site flew under the radar 
— but eventually it grew too big for subscription publishers to ignore. 
In 2015, the Dutch company Elsevier, supported by the wider publish- 
ing industry, brought a US lawsuit against Elbakyan on the basis of 
copyright infringement and hacking. If Elbakyan loses, she risks having 
to pay many millions of dollars in damages, and potentially spending 
time in jail. (For that reason, Elbakyan does not disclose her current 
location and she was interviewed for this article by encrypted e-mail 
and messaging.) In 2015, a US judge ordered Sci-Hub to be shut down, 
but the site popped up on other domains. 

Elbakyan has found her name splashed across newspapers, and says 
she typically gets a hundred supportive messages a week, some with 
financial donations. She says she feels a moral responsibility to keep 
her website afloat because of the users who need it to continue their 
work. “Is there anything wrong or shameful in running a research- 
access website such as Sci-Hub? I think no, therefore I can be open 
about my activities,” she says. 

Critics and supporters alike think that the site will have a lasting 
impact, even if it does not last. “The future is universal open access,” 
says Heather Piwowar, a co-founder of Impactstory, a non-profit firm 
incorporated in Carrboro, North Carolina, which helps scientists track 
the impact of their online output. “But we suspect and hope that Sci- 
Hub is currently filling toll-access publishers with roaring, existential 
panic. Because in many cases that's the only thing that’s going to make 
them actually do the right thing and move to open-access models” 

Whether or not that’s true, Elbakyan says she will keep building 
Sci-Hub — in particular, to expand its corpus of older manuscripts 
— while studying for a master’s degree in the history of science. “I 
maintain the website myself, but if 'm prevented, somebody else can 
take over the job,” she says. = 
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CRISPR CAUTIONARY 


A budding biologist put gene-drive ethics 
before experiments. 


BY HEIDI LEDFORD 


Kevin Esvelt's appetite for tinkering with evolution. As he stood mar- 

velling at the iguanas, birds and sheer diversity of the place that had 
inspired Charles Darwin, Esvelt vowed to understand evolution — and 
improve on it. “I wanted to learn more about how these creatures came 
to be; he says. “And, frankly, I wanted to make more of my own” 

Today, Esvelt is still a precocious biologist. Less than a year after 
launching his lab at the Massachusetts Institute of Technology Media 
Lab in Cambridge, he has already made a name for himself as one 
of the pioneers of a controversial technique called a gene drive. His 
method harnesses CRISPR-Cas9 gene editing to circumvent evolu- 
tion, forcing a gene to spread rapidly through a population. It could 
be used to wipe out mosquito-borne diseases such as malaria or 
eradicate invasive species. But it could also set off unintended eco- 
logical chain reactions, or be used to create a biological weapon. 

The idea of CRISPR gene drives hit Esvelt when he was tinkering 
with the Cas9 enzyme in 2013. “Ihad one day of absolute, ecstatic 
glee: this is what's going to let us get rid of malaria, says Esvelt. “And 
then I thought, “Wait a minute.” 

Following that thought, Esvelt has worked to ensure that ethics 
comes before experiments. He first sounded the alarm in 2014, 
calling for public discussion about gene drives even before he had 
demonstrated that a CRISPR—Cas9 gene drive could work (K. A. Oye 
et al. Science 345, 626-628 (2014); K. M. Esvelt et al. eLife 3, e03401; 
2014). Since then, he and his colleagues have shown how gene drives 
might be made safer, and how they could be reversed (J. E. DiCarlo 
et al. Nature Biotechnol. 33, 1250-1255; 2015). 

This year, his advocacy has begun to bear fruit. Researchers and 
policymakers worldwide have been discussing the technology, and 
a report from the US National Academies of Sciences, Engineer- 
ing, and Medicine urged that gene-drive research proceed, but 
cautiously. Omar Akbari, who studies gene drives at the University 
of California, Riverside, believes Esvelt’s outreach has focused public 
attention — and attracted funding — for a nascent technology at just 
the right time. “I attribute that to Kevin,’ says Akbari. “It’s difficult 
for a scientist to do what he's done?’ = 


| twas a trip to the Galapagos Islands at the age of ten that first whetted 
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ANGLADA-ESCUDE 


DIVERSITY 
TRAILBLAZER 


A transgender physicist paved the way for 
greater acceptance of minority groups. 


BY ELIZABETH GIBNEY 


to see the data first. This posed a problem for Elena Long, a nuclear 
physicist who has fought for her field to be more inclusive of people 
from sexual and gender minorities. “We didn't have any data, because 
people considered it too offensive to ask if we exist. It was a catch-22.” 


B hysicists can be open to seeing the world in new ways, but they need 
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An astronomer detected the nearest known 
planet outside the Solar System. 


BY ALEXANDRA WITZE 


evidence of an alien world rippled across his computer screen. He 

had been almost certain that an Earth-sized planet orbited Proxima 
Centauri, the star nearest the Sun at just 1.3 parsecs (4.2 light years) 
away. 

To Anglada, an astronomer at Queen Mary University of London, the 
discovery came as more of a relief than a shock. He and his colleagues 
had been working feverishly to stake their claim in the competitive 
world of planet hunting, and the Proxima find confirmed that they were 
on the right path. “We made it,” he says. 

To the rest of the world, the discovery of the closest known 
exoplanet to Earth stoked the public imagination. It raised questions 
about whether life might exist in our cosmic backyard, and whether 
astronomers might be able to detect it. 

These are the kinds of question that got Anglada into planet 
hunting in the first place. A science-fiction fan while growing up 
near Barcelona, Spain, he got his astronomical start doing data simu- 
lations for Gaia, a European Space Agency mission to map 1 billion 
stars. Later, he turned his data-crunching skills to exoplanets. He 
developed a method for extracting faint planetary signals from data 
gathered by the world’s premier ground-based planet-hunting instru- 
ment, the High Accuracy Radial velocity Planet Searcher (HARPS) at 


G uillem Anglada-Escudé wasn’t surprised early this year when 


Long was one of the architects of a first-of-its-kind survey run by the 
American Physical Society (APS), charting the experiences of physicists 
who are lesbian, gay, bisexual, transgender or from another sexual or 
gender minority (LGBT). 

The findings, presented to a packed room at the APS March meeting 
this year, were stark. Of the 324 scientists who responded, more than 
one in five reported having been excluded, intimidated or harassed at 
work in the previous year. Transgender physicists reported the high- 
est incidence of discrimination. Long, who is transgender herself, 
was unsurprised. In 2009, she began work for her PhD at the Thomas 
Jefferson National Accelerator Facility in Newport News, Virginia, 
which lacked trans-inclusive employment protections and health-care 
benefits. She felt isolated without LGBT support networks. “I loved the 
work I was doing, and I loved the research. But it was rough,’ she says. 

So she founded the LGBT+ Physicists support group and began 
pushing for greater recognition at the APS, which eventually created a 
committee to collect data on LGBT discrimination. Many physicists, she 
says, could not even understand the need for such a study. Thanks to Long 
and her colleagues, physics is emerging as exemplary in its approach to 
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the European Southern Observatory in La Silla, Chile. 

“Guillem has a natural talent of seeing the big picture where others 
see details,” says Mikko Tuomi, an astronomer at the University of 
Hertfordshire in Hatfield, UK, and a collaborator of Anglada’s. 

But Anglada soon ran straight into high academic drama, tussling 
with other researchers over who deserved credit for discovering a 
planet bigger than Earth and smaller than Neptune orbiting the star 
Gliese 667C. “I could have left the field and done something else,” he 
says. “But I took the decision of following it very aggressively.’ 

He dived into HARPS data, publishing paper after paper on the plan- 
etary signals he discovered amid the background noise in the data. And 
then, as if to push back on all the secrecy and competition, Anglada 
launched a very public hunt for a planet orbiting Proxima. 

He put together a team and got observing time on HARPS, as well as 
other telescopes that could double-check whether any promising evi- 
dence that they found was caused by stellar activity, which can mimic 
the signs of a planet (a problem that plagues many exoplanet claims). 
The researchers put nearly all their details on an outreach website and 
social-media accounts. Being so transparent “didn’t seem dangerous at 
all’, Anglada says. “We had a feeling nobody else would do this.” 

Within days, they confirmed that the planet was there; within weeks, 
they submitted a manuscript detailing their discovery. The planet, called 


these issues, says Samuel Brinton, a board member of the society Out 
in Science, Technology, Engineering and Mathematics. “We are literally 
using their work to start changes for the better in multiple fields,’ he says. 
The APS accepted the recommendations made in the March report. And 
in August, a major APS division voted to move its 2018 meeting out of 
Charlotte, North Carolina, in response to a state law that forces people 
to use public toilets that match the gender they were assigned at birth. 
Long has meanwhile won two young-scientist awards offered by 
her lab and become a co-leader on two new accelerator experiments. 
“Tve known a lot of postdocs whove done voluntary work, and usually 
it compromises their science,’ says Karl Slifer, Long’s postdoctoral 
supervisor at the University of New Hampshire in Durham. “T’ve never 
seen that in Elena.” (Long attributes her strict time management to a 
computer program she designed that charts every hour of her day.) 
Now Long is helping to set up an APS membership group focusing on 
diversity and inclusion, which she hopes will make it easier for scientists 
in other minority groups to flourish. “I'm sure there are other people fac- 
ing problems in the field I never thought about,’ she says. “I don’t want 
them to wait seven years to get to a place where they can have a voice.’ m 
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Proxima b, is at least 1.3 times the mass of Earth and orbits Proxima 
every 11.2 days. 

Although it is close to its star, the world is within the ‘habitable zone; 
where liquid water could exist on its surface. That makes it not only 
the closest known exoplanet of the 3,500-plus confirmed so far, but 
also a place where otherworldly life could thrive — a double bonus for 
researchers and science-fiction fans alike. 

Just before the paper was published in Nature in August (G. Anglada- 
Escudé et al. Nature 536, 437-440; 2016), Anglada e-mailed British sci-fi 
writer Stephen Baxter, author of the novel Proxima (Gollancz, 2013). 
They corresponded about what life might be like on a world with one 
hemisphere permanently facing a flaring star, as happens at Proxima. 

People could eventually get a close-up look at Proxima b. The Break- 
through Starshot initiative aims to send fleets of tiny laser-propelled 
spacecraft to a nearby star, and it may target Proxima as its closest and 
best option. 

Anglada’s next step is to see whether Proxima b transits, or passes 
across the face ofits star as seen from Earth. The chances are low, but if 
it does, then much more science can be gleaned when Proxima’s light 
passes through the planet’s atmosphere, if it has one. 

And if the transit does not happen? Then Anglada may be off, to tease 
out some other signal of another world. = 
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Marine ecologist Jane Lubchenco (right) has helped to launch training programmes that help scientists to engage with policymakers. 


So you want to 
change the world? 


In these tumultuous times, Nancy Baron urges scientists to speak from 
the heart to build public trust in research. 


as a communications coach, few scien- 

tists felt that engaging outside academia 
was part of their job. Now, when I ask work- 
shop participants to stand if they ‘want to 
change the world; almost all rise to their feet. 
Often, they look around in astonishment to 
see so many peers standing with them. 


FE ifteen years ago, when I began to work 


I work largely with environmental 
scientists: ecologists, hydrologists, fisheries 
researchers and others documenting the esca- 
lating challenges that face our planet. Some 
are moved by the disappearance of the wildlife 
they study, or the brooding omnipresence of 
climate change. Others are dismayed by divi- 
sive political discourse and election results, 


where evidence may have been devalued or 
dismissed. All are compelled to reach out. 
This marks a radical shift. A decade ago 
I spent much of my time persuading scien- 
tists of the value of outreach to the public 
and policymakers. Today, most consider 
such communication crucial: they recog- 
nize that the publication of research in > 
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> journals such as this one is not an end in 
itself, but a launch pad to further things. It 
is no longer a question of ‘Should I do this?; 
but rather, “How do I do this?’ 

Fortunately, there’s a quiet revolution 
happening, with workshops, guide books'” 
and much discussion on social media about 
effective communication. This revolution is 
buttressed by social-science research on the 
benefits of turning stereotypically stiff and 
formal scientists into warm, accessible people. 
And it is supported by a growing network of 
scientists who mentor each other's outreach. 
There has been a shift from peer pressure to 
keep quiet, to peer support to speak up. 

This year more than ever it has become 
clear that the revolution can be kept quiet no 
longer. It is time for scientists to take social 
responsibility and to be recognized and 
rewarded for doing so. 


JUDGEMENT DAY 

Fear of being judged by one’s peers still haunts 
many academics. No matter who they are 
talking to, scientists often fret about how they 
might sound to their colleagues instead of 
concentrating on the audience at hand. They 
also worry about being subjected to the sort 
of abuse hurled at climate scientist Michael 
Mann. His famed hockey-stick graph, pub- 
lished in 1999, reconstructed a temperature 
record over the past 1,000 years. His resolve to 
speak out about his findings plunged him into 
a maelstrom of hacked e-mails and smears by 
climate-change deniers. 

But a 2014 study’ showed that scientists 
might risk less criticism than they think — at 
least from each other. A survey carried out 
at nine ecological and environmental con- 
ferences found that, to a surprising degree, 
scientists wanted to engage with the public 
and policymakers more. Many respondents 
believed that scientists should interpret and 
even advocate for the use of their science. A 
decade earlier, a similar study showed that 
ecological scientists strongly disagreed with 
the idea of active advocacy’. 

The question of how scientists should 
engage is a deeply personal one’. Neverthe- 
less, there is a gap between desire and action. 
In the 2014 survey, the main barriers were a 
self-perceived lack of competence at navigat- 
ing the science-policy interface, as well as 
past negative experiences and institutional 
norms that did not support them. Lack of 
time and resources, which often spring to 
mind first, were lesser factors. The survey 
found that the more scientists were aware of 
how their work fits into the policy landscape, 
the more likely they were to get involved. 


‘HOW TO” 

There are organizations that can help. Two 
examples are: COMPASS, a non-profit, non- 
advocacy programme (for which I work) 
that helps scientists to engage effectively 


Strident campaigning is just one way to stand up for science. Warm and well-timed conversations with 
key players are crucial too. 


in the public discourse about the environ- 
ment; and the Leopold Leadership Program, 
now a part of the Stanford Woods Institute 
for the Environment in California. Both 
were launched under the leadership of Jane 
Lubchenco, a marine ecologist at Oregon 
State University in Corvallis (before she 
became chief of the National Oceanic and 
Atmospheric Administration), and oth- 
ers. These pioneering programmes were 
created to, as Lubchenco has said, aid “faster 
and more effective transmission of new and 
existing knowledge to policy and decision- 
makers and better communication of this 
knowledge to the public”. 

Since 2000, my colleagues and I at 
COMPASS have worked with thousands of 
environmental scientists. Over the years we 
have learned that teaching them commu- 
nication skills is necessary — but not suf- 
ficient. Between the desire to communicate 
and actual engagement, there is a ‘valley of 
death’ that can stop scientists from trying 
out their skills in the real world. They need 
guidance on how to bridge that gulf. 

The first step is to examine the big 
picture and determine what your role in 
a conversation might be. What do you 
uniquely bring to this issue? Next, analyse 
potential opportunities: identify the players, 
including the decision-makers and stake- 
holders, and determine what they need — 
and when. This is the key to making advice 
relevant and timely. Professional societies, 
government-relations departments at some 
universities, and organizations such as 
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COMPASS can help to broker connections. 
So, too, can other scientists. 


LINKED IN 

Once a contact is made, a single successful 
interaction often creates new opportuni- 
ties. That first action might be an opinion 
piece in a local paper, speaking to commu- 
nity members, or an e-mail to someone in 
local government that meets with a positive 
response. Even just preparing properly for 
media attention to an academic publication 
can launch conversations with decision- 
makers that take on a life of their own. 

This is what happened to Jenna Jambeck, 
who studies waste-management engineer- 
ing at the University of Georgia in Athens. In 
2015, a paper by Jambeck’s team about inter- 
national data on how much plastic debris 
enters the ocean each year was published 
in Science®. Because her working group of 
scientists was at the National Center for 
Ecological Analysis and Synthesis, where 
COMPASS has a staff presence, she asked for 
guidance. We helped whittle her 26 statistics 
down to a few compelling points: “Humans 
dump 8 million tonnes of plastics into the 
oceans each year. That's five grocery bags 
of plastic for every foot of coastline in the 
world,’ for instance. When the media flood 
came, she was prepared. 

This launched Jambeck’s role in the politi- 
cal conversation about plastic waste. Since 
then, she has become a sought-after expert 
at congressional hearings and around the 
globe. Her science is important, but it is 


DOMINIC FAVRE/KEYSTONE/AP 


also her conversation that makes her so 
successful. Her voice resonates with people 
because of her optimism that this problem 
can be solved, along with her willingness to 
show her passion for recycling and waste 
management. One of her best lines is that 
she met her husband at the landfill. 

Jambeck’s friendly tone matters. A study 
this year revealed that scientists most 
prioritize communication designed to edu- 
cate and defend against misinformation, and 
that they least prioritize communication 
designed to build trust and resonance with 
the public’. Yet research shows that people’s 
willingness to listen is linked to how likeable, 
warm and authentic they find the speaker’. 
Building trust requires a human touch. 

Ken Caldeira, an atmospheric scientist at 
the Carnegie Institution for Science in Stan- 
ford, gave a lecture on World Oceans Day to 
leaders of the United Nations Educational, 
Scientific and Cultural Organization in Paris 
this year. He presented the perils of acidify- 
ing oceans but reframed them as a question: 
how can we help the oceans help us? This 
empathetic approach resonated. Focusing 
on solutions, and using inclusive language 
such as ‘we and ‘us’ when talking to audi- 
ences makes scientists — and their findings 
— more approachable. Caldeira has also 
embraced social media. Twitter, Facebook 
and other social platforms are key to shift- 
ing public opinion, as events this year have 
shown. Scientists should join these conver- 
sations to extend their reach. Sometimes, 
despite doing the right work and reaching 
out to the right people, the science does not 
prevail. Jonathan Moore, a coastal scientist 
at Simon Fraser University in Vancouver, 
Canada, spent four years tallying up the 
social and environmental havoc that would 
be caused by a liquid natural-gas terminal 
on the northern coast of British Columbia: it 
would disrupt habitats that support salmon 
relied on by 11 First Nations groups over an 
area the size of Switzerland, his team found. 
Although Moore and his colleagues have 
published in the peer-reviewed literature’, 
been covered by hundreds of media outlets 
and met with communities and policymak- 
ers, the terminal was granted government 
approval bya regulatory review process that 
many Canadian scientists find lacking. 


SUPPORT NETWORK 

Public engagement takes perseverance and 
courage. It also needs emotional support. It’s 
a long haul, and hard to do alone. In 2014, 
Moore was a member of the first cohort of 
the Wilburforce Fellowship, run by a chari- 
table foundation in Seattle, Washington. The 
year-long programme is designed to provide 
leadership and communications training to 
scientists at all career stages to form last- 
ing networks of support across Canada, the 
United States and Mexico. Initial training 


lasted only a week, but fellows continue to 
share successes, failures and advice for resil- 
ience through social media and e-mail. At 
conferences, they stay up late discussing how 
to advance their goals and offer each other 
encouragement. 

Some of the earliest fellowships, includ- 
ing the Leopold programme, were intended 
specifically for tenured academics, because 
outreach was seen as a career risk. This is 
changing. The next generation of young 
scientists is most eager to change the world, 
and is stepping up to do so. 

Canada’s government, elected last year 
in place of an administration that was 
accused of muzzling government scien- 
tists, has promised to usher in an era of evi- 
dence-based decision-making. More than 
1,800 early-career researchers have contrib- 
uted recommendations on how to rebuild 
confidence in environmental assessments 
and regulatory reviews — so far with mixed 
results (see youngresearchersopenletter.org). 

Sally Otto, an evolutionary biologist at 
the University of British Columbia in Van- 
couver, directs the Liber Ero (Latin for T will 
be free’) fellowship. It funds early-career 
scientists to do applied research on important 
conservation issues 


and providestrain- “Thereisa 

ing in publicand ‘valley of death’ 
political outreach. that canstop 
Otto also donated scientists from 
some of her 2011 trying out their 
MacArthur ‘genius — s]zi/|s.” 


grant’ to launch a 

policy fellowship that embeds scientists in 
government in Ottawa. The scheme is run by 
Mitacs, a Canadian non-profit organization 
that specializes in partnerships and place- 
ments between academic and non-academic 
institutions. 

Young scientists are under massive pres- 
sure to win grants and publish. So a growing 
cadre of senior scientists is instigating fellow- 
ship programmes that provide early-career 
researchers with communication skills and 
connections, and in some cases also funding. 

Such efforts are changing the academic 
reward systems, albeit too slowly. Several 
institutions now have tenure packages that 
recognize communication and outreach, 
along with conventional measures of pub- 
lication and teaching success. Lisa Graum- 
lich, dean of the College of the Environment 
at the University of Washington in Seattle, 
has expanded her definition of impact to 
include public engagement, which she con- 
siders a logical product of engaged scholar- 
ship. Some argue that engagement, although 
harder to measure than citations, is a better 
proxy for academic success”. 

Training on how to extend into the public 
realm and understand the workings of gov- 
ernment shouldn't be available only through 
boutique fellowships or under the wings of 


a few motivated senior faculty members. It 
should be a part of every young scientist's 
education. In 2014, COMPASS led a work- 
ing group funded by the US National Science 
Foundation (Building Systemic Communi- 
cation Capacity for Next Generation Scien- 
tists) to assess the science-communication 
workshops and training available to US 
graduate students in science, technology, 
engineering and mathematics. We provided 
recommendations for integrating science- 
communication skills into graduate educa- 
tion (see compassblogs.org/gradscicomm). 
So far, these early efforts, in need of funding, 
have languished. 


SPEAK OUT 

In these uncertain times, the voices of scien- 
tists are more important than ever. Efforts 
should not only target political leaders, but 
also aim to create a groundswell of public 
support. Ultimately, leaders must listen to 
their constituents. This is a time for scientists 
to double down and launch a ground cam- 
paign for the hearts and minds of the public. 

Society needs to hear from those who can 
explain empirical evidence in a way that 
resonates with people’s values, whatever 
they may be. We all need to be more open- 
minded and inclusive — and we need to 
muster the courage to speak from the heart 
and learn to listen with empathy. 

My experience is that scientists can 
emerge as powerful agents of change. By 
building capacity, collaboration and con- 
fidence among researchers, we can bolster 
public engagement, inform decision-making 
and inspire society to forge a better future. m 


Nancy Baron is science outreach director for 
COMPASS, a public-engagement organization 
for scientists, and author of Escape from 

the Ivory Tower: A Guide to Making your 
Science Matter. She is based at the National 
Centre of Ecological Analysis and Synthesis in 
Santa Barbara, California, USA. 

e-mail: nbaron@compassonline.org 
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Take the long view 


At the end of a difficult year for evidence, Ian L. Boyd, a chief scientific adviser 
to the UK government, draws lessons for making research more relevant. 


he post-truth politics of 2016 has 
"Tove ita difficult year for those of us 

who like to see decisions guided by 
evidence. As a chief scientific adviser to the 
UK government, I would caution against 
any disengagement by the science commu- 
nity from policymaking. Science remains an 
integral part of the processes of government, 
and its outputs are increasingly needed. 

In the areas for which I have responsi- 
bilities — food and environment — there 
is an obvious need to maintain momentum 
in finding solutions, such as for mitigat- 
ing climate change. The deeper we dig, the 
more we understand the important role of 
the environment for human health and wel- 
fare, including for inequality. Achieving the 
United Nations’ Sustainable Development 
Goals requires a strong and unified approach 
by the science community, irrespective of 
populist politicians and policies. 

In reality, government is mostly preoccu- 
pied with reacting to acute events. Strategy 
swings with the political whim, the urgent 
crowds out the important and policymak- 
ing can become a displacement activity — a 
survival strategy that creates the impression 


of progress against a background of 
intractability. As a result, it is often only 
scientists in and around government who 
carry the baton for confronting the brutal 
realities of environmental challenges. 

There is a danger that, in the maelstrom 
of day-to-day policy delivery, scientific 
input will be reduced to a mere technical 
instrument. To be more involved in creating 
policy, scientists need to focus on different 
research priorities from those normally seen 
as important to their careers. 


HARDEST QUESTIONS 

This different posture is both structural and 
intellectual. Structurally, to become trusted 
components of the policy process, scientists 
have to develop a heightened appreciation of 
how government works. Working in govern- 
ment-related science needs to be valued as 
equal in importance to working in academia. 
(Having done both, I can say that this is not 
currently the case.) The constraints on how 
government scientists behave, and on what 
they can and cannot say at any particular time, 
need to be appreciated much more sharply by 
scientific colleagues and press intermediaries. 
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Intellectually, we need to frame the 
challenges faced by government using 
language that reveals their intrinsic value to 
scholarship and academic progress. We have 
perceptions of what ‘excellent’ science looks 
like, but often, almost by definition, this does 
not include many of the local, urgent and 
multidisciplinary questions being addressed 
in government. 

Yet governments tackle some of the most 
difficult questions facing people and the 
planet — from what particular price for car- 
bon might affect employment to what por- 
tion of the health budget should be spent on 
prevention. Costing environmental degrada- 
tion in to the decisions made in everyday life 
stands out as one of the greatest challenges. 
Feigning to ignore these, or focusing on only 
one component of them to the detriment of 
building a broader understanding of how 
they might be solved, happens too often in 
academic science. 


SYSTEM CHANGE 

In practice, governments manage sys- 
tems — farming, say, or transport. These 
include many interacting processes and 


ILLUSTRATION BY DAVID PARKINS 


actors, natural and social. In general, 
governments have little control over the 
components of these systems. What they 
most need scientists’ help with is under- 
standing which parts can be managed, and 
studying the behaviour of whole systems in 
response to those changes, often through 
data about system indicators and research 
into their dynamics. 

For example, scientific advice about fish- 
eries tends to wrongly assume that fisheries 
management is about managing fish, when 
we can normally only manage those who 
fish. Similarly, the solution to the bovine 
tuberculosis epidemic in Britain is more 
about managing farmer behaviour than it 
is about applying well-tried epidemiologi- 
cal solutions. 

The contribution made by supply- 
side innovation — that is, inventing new 
materials, devices or structures, or probing 
the complexity of nature — is undoubtedly 
a good thing. It is often touted by politicians 
as the main way in which research adds 
value in civil society. As scientists, we often 
acquiesce to this linear view because it is the 
route through which money tends to flow. 
Politicians like it because it supports mar- 
ket activity and economic growth, and gives 
people more of what they want. 

However, for most policies to work, equal 
strides are needed in demand-side innova- 
tion (research that answers specific, prag- 
matic questions). In food and environment, 
this often involves increasing system effi- 
ciency and reducing demand. 

For example, many countries have now 
legislated for the use of low-energy light- 
ing, stimulated by supply-side innovation 
that increased light-bulb efficiency by at 
least an order of magnitude. But a lack 
of simultaneous demand-side innova- 
tion, such as through behaviour change, 
has meant that the overall power usage 
has continued to rise — we just use more 
lights. The same goes for the fuel efficiency 
of cars. Arguably, by focusing on one part 
of the problem, science has only added to 
its intractability. 


PROBLEM-SOLVING 

Tackling many of the environmental grand 
challenges will need large-scale investments 
in system models. Climate science has 
emerged from the need to forecast weather 
and so provides a template for how this could 
be done. It involves high levels of organiza- 
tional design, including a global environ- 
mental-observation network providing 
large data flows linked to ocean and atmos- 
pheric models run on high-performance 
computers. 

Many other areas of environmental 
science — air and water, food, waste, 
and biodiversity — need a similar scale 
of effort and investment. The data flows 


from observational networks are emerging, 
but there is insufficient coordination of 
system-model development to capitalize 
on these data. 

Achieving this requires changes in policy 
for science and social science to incentivize 
the research community to participate in 
government. This means reassessing the 
relative value of game-changing discoveries 
(supply-side innovation) over the organi- 
zational, system-based solutions that are 
needed (demand-side innovation). 

If researchers want to play their part in 
solving major problems, such as decoupling 
economic growth from resource consump- 
tion, they need to change their focus. This 
requires greater prioritization of behav- 
ioural and operational research, a disci- 
pline that gets scant coverage in academic 
circles but which encompasses systems 
analysis and modelling. It also requires 
greater value to 


be placed onsyn- “Oftenitis the 
thesis asatoolin simple solutions 
discovery because applied well 

of its power to thatmake the 


describe system- 
level behaviour. 
Often it is the simple solutions applied well 
that make the difference rather than new 
technologies. 

This is not the world of the laboratory 
bench or the individual theoretician. It 
is one in which system models are being 
continually refined on the basis of big, 
open data about the system’s state and its 
responses. This will blur the boundaries 
between experimentalists and those who 
run the policies — because a policy becomes 
a hypothesis. And it will turn science back 
from the path of being perceived as an irrel- 
evant domain of the intellectual elite. Recent 
growth in anti-science views on both sides 
of the Atlantic suggests that this change 
is imperative. 


difference.” 


WISH LIST 

So, what are the really big systems chal- 
lenges in my areas of responsibility? First, 
in my view, we need to know much more 
about the future of resources. Raw materials 
drive the global economy; if they cannot be 
grown, they need to be mined. In response, 
we have invented solutions such as the cir- 
cular economy. Although no one doubts 
the wisdom of driving up the productivity 
associated with the materials already in the 
economy, their reuse may divert attention 
from difficult decisions about reduction. 
Better systems-based models of resources 
and materials are needed to help frame the 
policy options. 

Similarly, we need to know the level of 
assurance of our worldwide food supply. 
It is difficult for policymakers to estimate 
how much reserve is needed to create 


resilience to different kinds of shocks, 
natural or human-made. Most govern- 
ments currently leave this crucial func- 
tion to the market, but is this wise? Little 
is known, beyond what equilibrium eco- 
nomic models tell us, of the stability and 
resilience of food-supply networks. Many 
human-made and natural networks show 
nonlinear behaviour and have a capacity to 
reach a tipping point. Could this happen to 
global food supplies? 

An extension of this question con- 
cerns the future of livestock. This has 
very low levels of material efficiency, so 
shifting away from livestock production 
might simultaneously address concerns 
about future food supplies and resources. 
Although livestock production can be the 
best use of marginal land and is important 
in some developing countries, it is also a 
significant contributor to greenhouse gases, 
exacerbates the problems of antimicrobial 
resistance, causes pollution, increases the 
risks from diseases that are spread from 
animals to humans, and drives the destruc- 
tion of tropical forest. Furthermore, current 
levels of meat consumption in the devel- 
oped world are unhealthy. Are there differ- 
ent systemic solutions to meat production 
and consumption that address these kinds 
of problem? 

There are other priorities, of course. 
But in the interests of focusing on finding 
simple solutions and applying these well to 
achieve maximum effect, these examples 
could address many of the large-scale and 
long-term environmental challenges facing 
the planet. Intellectual resources need to 
be deployed where they will make the big- 
gest difference, and this requires leadership 
and vision. 

Politicians who are willing to listen may 
say this is all too difficult. However, scien- 
tific leadership can help policies and the 
systems they are designed to influence to 
evolve together. Pointing to small changes 
in key variables and introducing changes 
incrementally can have big effects over time. 

This is illustrated by how some coun- 
tries, including Britain and several other 
European nations, are on track to eliminate 
sending waste to landfill. Incremented taxa- 
tion of landfill waste has changed behaviours 
and encouraged investment in recycling and 
reuse. A simple tax applied in the right place 
and appropriately scaled has shifted the 
whole system state. It has changed behav- 
iours without stranding assets and, impor- 
tantly from a political perspective, it has not 
upset the electorate. m 


Ian L. Boyd is chief scientific adviser in the 
UK Department for Environment, Food and 
Rural Affairs and professor of biology at the 
University of St Andrews, UK. 

e-mail: ilb@st-andrews.ac.uk 
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BOOKS & ARTS 


Brooklyn Grange, a 1-hectare organic farm that spans two rooftops in New York City. 


AGRICULTURE 


Sowing the city 


As holiday feasts begin, Laura Lawson surveys the fruitful history of urban farming. 


r Nhe floating gardens of Aztec Mexico; 
the Versailles kitchen garden that 
supplied the table of Louis XIV of 

France; hydroponic vertical farming in 

Japan. Throughout history, agriculture has 

been integrated into cities in various forms, 

moulded by environmental conditions, 
design intent and technological and agro- 
nomic innovation. Today, an explosion in 
studies on urban agriculture is broadening 
perspectives. It’s illuminating the view from 

Africa and Latin America in Luc Mougeot’s 

2005 Agropolis (Earthscan), the varied his- 

torical contexts in Dorothée Imbert'’s edited 

Food and the City (Harvard Univ. Press, 

2015) and the design propositions in André 

Viljoen’s 2005 CPULs: Continuous Produc- 

tive Urban Landscapes (Routledge). These 

books not only normalize urban farming, 
but also address how it can be adapted to 
tackle food security. 

According to the 1996 report Urban 
Agriculture by the United Nations Develop- 
ment Programme, an estimated 800 million 
urban dwellers worldwide grow food and 
raise livestock. However, many people still 


ask whether urban agriculture can grow 
enough food to be useful. The assumption 
is that farming is best as a rural practice, 
with extensive land and trained farmers to 
maximize productivity. Many others are 
increasingly concerned about the impacts 
of the agro-food system on health and the 
environment, and look to the different pos- 
sibilities of urban agriculture — addressing 
nutrition, economic development, commu- 
nity activism and environmental awareness. 
Urban farming is, in fact, rarely bracketed 
with the agro-food system, but it is crucial 
to initiatives that address food access and 
affordability. It also gives farmers the flex- 
ibility to choose crops on the basis of genetic 
diversity or cultural preferences — such as 
‘heritage’ vegetables or varieties valued by 
recent immigrants. It is not a variation on 
rural agriculture, but a distinct practice. 


GROWING TOGETHER 

Farming has been part of the structure of 
cities for millennia, with farmers supplying 
urban markets and repurposing waste such 
as manure (see ‘Capitals of cultivation). In 
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The Economy of Cities (Random House, 
1969), urban theorist Jane Jacobs (A. Wil- 
liams Nature 537, 614-615; 2016) argued 
that agriculture did not precede early cities: 
instead, cities inspired agriculture through 
their centrality to trade and capacity for 
innovation. Archaeological evidence from 
France to Iran — of canal and irrigation 
systems, terracing and walled enclosures — 
reveals the large investment often needed to 
sustain urban productivity. 

Colonization plans such as Spain’s Law of 
the Indies — which held for 400 years in the 
Americas and the Philippines — placed agri- 
culture close to cities such as Santo Domingo 
to ease access to crops and animals, provide 
waste management and protect food supply. 
In 1683, William Penn, founder of Pennsylva- 
nia, planned Philadelphia as a “green country 
town’ where productive orchards and gardens 
would ensure fresh air and reduce the risk of 
fire. Later, Ebenezer Howard's British ‘garden 
cities’ incorporated allotments and farms in 
‘green belts, starting with Letchworth at the 
turn of the twentieth century. In 1928, the 
German landscape architect Leberecht Migge 
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proposed to integrate allotment gardens into 
public housing developments around an 
expanding Frankfurt. And US architect Frank 
Lloyd Wright planned a decentralized settle- 
ment (never realized) called Broadacre City, 
in which each family was allotted almost half 
a hectare for food production. 

Whether informal or planned, agricul- 
ture has had to adapt to urban expansion. In 
some cases, this has led to intensive, innova- 
tive cultivation to increase yields and support 
higher-value goods. From the twelfth to the 
nineteenth centuries, farmers in the Marais 
district of Paris transformed swampland into 
productive gardens and manipulated micro- 
climates to grow peas, artichokes, figs and 
apricots. They planted densely in raised beds 
nourished with (and kept warm by) horse 
manure, and used cold frames and cloche 
coverings to extend the growing season, as 
well as trellis supports to encourage fruit 
yields. Yet even with innovation and intensi- 
fication, rising land costs and improved trans- 
port often led farmers in Paris and elsewhere 
to relocate to otherwise unbuildable sites, or 
out into the urban periphery. 

City farming has also been a way to relieve 
poverty. The nineteenth and twentieth cen- 
turies saw English urban allotments, French 
jardins familiaux and jardins ouvriers, Ger- 
man kleingéirten and, in US cities such as New 
York, vacant-lot cultivation associations and 
relief gardens that provided land and training 
as both direct aid and social reform. Garden- 
ing was about food, but also about tradition, 
culture, education and morale, for instance 
harnessing the know-how of immigrants. 
These programmes laid the way for domes- 
tic food production in Australia, Europe 
and North America during the two world 
wars, when hundreds of thousands of civil- 
ians grew their own food so that commercial 
crops could serve the war effort. Second > 


Aztecs farming on an artificial islands called a 
chinampa in Tenochtitlan. 
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Books in brief 


— A Tale of Seven Scientists and a New Philosophy of Science 

Eric Scerri OXFORD UNIVERSITY PRESS (2016) 

Is scientific discovery all about starry intellects and eureka 
moments? Philosopher of science Eric Scerri asserts otherwise in 
this thoughtful treatise on research as evolution, not revolution — 
collective, piecemeal endeavour rather than heroic act. Scerri 
spotlights seven ‘unknowns’ who, pivoting around the work of Niels 
Bohr and Dmitri Mendeleev, helped to unravel atomic structure. So 
physicist John Nicholson’s concept of the quantization of angular 
momentum informed Bohr’s quantum theory of the atom, and the 
amateur Anton van den Broek pioneered the idea of atomic number. 


_ATALE oF 
SEVEN SCIENTISTS 


and a New phi 
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| ‘Ophy of Science 
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The European Research Council 

Thomas Konig POLity (2016) 

A beacon of science policy and research funding, the European 
Research Council (ERC) is about to celebrate its first decade. In this 
first comprehensive history, Thomas Kénig — former scientific adviser 
to ERC president Helga Nowotny — offers a multifaceted perspective. 
Behind the ERC’s success in maintaining excellence in ‘frontier’ 
research, he reveals, a welter of complex negotiations have played out 
in the politicized current of its framework funding programme, Horizon 
2020. A story of big scientific personalities and struggles for autonomy 
and accountability in the charged space between policy and science. 


a Human Evolution: Our Brains and Behavior 
Robin Dunbar OXFORD UNIVERSITY PRESS (2016) 
Se “Stones and bones” dominate the human-evolution story, notes 
| evolutionary anthropologist Robin Dunbar. In this exemplary study, 
he plunges instead into the “murky, unseen social world” during 
HUMAN olution the 6 million to 8 million years since the hominin lineage diverged 
lee from that of other African great apes. Drawing on research from the 
zu) British Academy’s From Lucy to Language project (H. Gintis Nature 
J 509, 284-285; 2014), Dunbar traces shifts in biology, genetics and 
. neurology over that key period. It’s a compelling journey into human 
nature, from the roots of our sociality to the rise of storytelling. 


The Undoing Project: A Friendship That Changed Our Minds 
Michael Lewis W. W. NORTON (2016) 

Soon after Michael Lewis published Moneyball (W. W. Norton, 2003), 
his best-seller on the metrics of sports recruitment, he found that 

the concepts were not original. Psychologists Daniel Kahneman and 
Amos Tversky got there decades before, codifying the systematic bias 
in our decision-making when faced with uncertainty. Lewis tells the 
story of this rare scientific collaboration and its impact on behavioural 
economics with novelistic brio, tracing the dual evolution of Kahneman 
and Tversky’s intense relationship and their research all the way to a 
denouement of human sorrow and Nobel glory. 


The Rhinoceros and the Megatherium: An Essay in Natural History 
Juan Pimentel; transl. Peter Mason HARVARD UNIVERSITY PRESS (2017) 
In this subtly discursive study, historian of science Juan Pimentel 
looks at two animals that profoundly marked science, yet were 
“imagined without being seen”. One was the rhinoceros famously 
hypothesized by Albrecht Durer ina 1515 woodcut. The other 

was a Megatherium fossil discovered in 1788 that, rendered in an 
engraving, allowed comparative anatomist Georges Cuvier to identify 
itas a giant sloth. Pimentel’s inspired pairing limns how image and 
imagination shape our understanding of nature. Barliara Kiser 
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CAPITALS OF 
CULTIVATION 


Agriculture has been part of the 
urban infrastructure since cities 
first began. 


100 Twelfth century: Farmers in the Marais, 

Paris, pioneer intensive cultivation, using 
200 horse manure for fertilizer and warmth. 
300 


Fourteenth to sixteenth centuries: The 
400 chinampas of Tenochtitlan, Mexico — 
artificial islands built out into lakes — 
500 generate up to two-thirds of the Aztec 
capital’s food. 


600 
1683: William Penn plans the city of 
Helo) Philadelphia in Pennsylvania colony as 
a utopian “green country town” with 
800 productive orchards and fields. 


1950 @ 


1 lt 
1940s: Garden ‘armies’ in Australia, 
Britain and the United States boost 
domestic self-sufficiency, leaving 
commercial crops to feed the war front. 


1960 © 


1970s: Amid recession, community 
gardening takes root in US cities from 
New York to San Francisco, transforming 
vacant lots into cropland. 


1970@ 


1980s: With the loss of Soviet trade 
subsidies, Havana reinvents urban 
farming as intensive ‘organoponics’, 


1980 Q~ and reintroduces oxen for ploughing. 


1990s: Aeroponics and LED lighting 
allow hyperflexibility — vertical fields, 
indoor farming and commercial urban 
agriculture. 


2000s: To support China’s urban 
expansion, green belts such as that 
around Shanghai grow around half of the 


BONO city’s vegetables. 


2016: The US Department of Agriculture 
unveils the Urban Agriculture Tool Kit, 


offering technical advice for city farming. 
2020 O 


> World War ‘victory gardens’ in Britain, 
for instance, saw everything from swathes 
of London’s Hyde Park to railway embank- 
ments do duty as cropland. Community-gar- 
den activism in the 1970s and 1980s built on 
the legacy of such programmes but focused 
on community empowerment. Movers and 
shakers emerged, such as Liz Christy, first 
director of the Council on the Environment 
of New York City’s Open Space Greening 
Program. And the movement led to neigh- 
bourhood gardens and city-wide advocacy 
organizations including Boston Urban 
Gardeners in Massachusetts. 


A MOVEABLE FEAST 

Over time, unique tensions have emerged 
around urban agriculture. Its mess, noise and 
perceived unruliness can vie with city author- 
ities’ urge towards order and cleanliness. 
Animal husbandry has been a flashpoint. 
Millions of urban dwellers in the develop- 
ing world raise everything from chickens 
to camels, but this form of farming was an 
early casualty in the West. Philadelphia’s 
free-ranging pigs, for instance, succumbed 
to early-twentieth-century legislation and the 
closure of urban livestock markets. 

The dirty reputation of farming persists in 
some Western cities. With a globalized food 
system capable of propelling asparagus from 
Peru into British supermarkets, food plan- 
ning and policy in Western cities tend to 
focus on trucking and warehousing, leaving 
officials prone to viewing localized produc- 
tion as a regressive, temporary use of urban 
space. Some urban-farm designers attempt 
to neaten up the image with colourful geo- 
metric plant displays next to street cafes and 
on rooftops. But they risk promoting style 
over substance: such plans rarely acknowl- 
edge the realities of cultivation, such as sea- 
sonality and the need for fallow beds. 

Thus urban agriculture has become a 
moveable feast — opportunistic or mandated, 
marginal or essential. Since the 1990s, initia- 
tives by bodies such as the Food and Agricul- 
ture Organization of the United Nations have 
encouraged policies, pilot projects and micro- 
lending to support farming for food security 
in many poorer African and Asian cities. And 
in socio-economically deprived postindus- 
trial regions such as Germany’s Ruhr valley 
and the Rust Belt in the US northeast and 
midwest, urban agriculture is seen as gener- 
ating economic activity and enabling access 
to better food. Growing Power, an organiza- 
tion founded by former basketball star Will 
Allen in Milwaukee, Wisconsin, provides 
training in closed-loop ‘aquaponics’ among 
other urban production practices across the 
United States (see go.nature.com/2gzhj4p). 

Elsewhere, Chinese megacities such as 
Shanghai are supporting agricultural green 
belts to retain local access to fresh vegetables. 
Cuba, long separated from global markets, 
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produces roughly 60% of its vegetables in 
towns and cities, with remarkable success 
based on traditional practices such as reintro- 
ducing oxen for ploughing. In 2011, San Fran- 
cisco, California, passed an urban-agriculture 
ordinance that made farming an allowable 
use of land and source of saleable food. Agri- 
culture is also included in the sustainability 
plans of cities such as Seattle, Washington, 
to reduce its carbon footprint and green the 
infrastructure. And housing developments 
including Serenbe outside Atlanta, Georgia, 
are embracing community-supported agri- 
culture, farmers markets and farm-to-table 
cuisine as a ‘lifestyle choice. 

In tandem with the push for sustainable 
business and circular economies (B. Kiser 
et al. Nature 531, 443-446; 2016), high-tech 
options are emerging. Vertical urban farms, 
hydroponic greenhouses and aeroponic fac- 
tories offer intensive production to offset 
high real-estate costs. AeroFarms in Newark, 
New Jersey, perfected its aeroponic salad- 

growing process 


“Urban at a local school 
agriculture will in 2011 and now 
be there when anticipates harvests 
new challenges of more than 4 mil- 
arise.” lion kilograms per 


year. But these 
capital-intensive projects demand business 
savvy, and critics argue that their presence 
can lead to older models such as community 
gardens being undervalued. 

Agricultural urbanism is entering a new 
phase as a framework to address community 
cohesion and food access. From Shanghai to 
Detroit, advocates are mapping the urban- 
agriculture landscape — highlighting the 
existence of vacant lots and ‘food desert’ 
neighbourhoods ripe for transformation. 
Often, this enables farmers to network, dis- 
cuss shared concerns and advocate. A model 
is New York City’s Five Borough Farm, a 
project of the Design Trust for Public Space. 
Here, site documentation, metrics develop- 
ment and proposals for supportive policies 
and practices are managed collectively. 

And the future? Urban agriculture offers 
promise for coping with climate change. 
Eschewing reliance on vulnerable transport 
connections, experimenting with seasonality 
and crop selection, and strengthening com- 
munity ties will help both mitigation and 
adaptation. Urban agriculture will be there 
when new challenges arise, and will continue 
to evolve as it responds to key issues that 
shape our cities. m 


Laura Lawson is dean of agriculture and 
urban programmes at Rutgers University 

in New Brunswick, New Jersey. Her books 
include City Bountiful and Greening Cities, 
Growing Communities (co-authored with 
Jeffrey Hou and Julie Johnson). 

e-mail: Ijlawson@sebs.rutgers.edu 
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Study epidemiology 
of fake news 


The results of the 2016 US 
presidential election and the UK 
vote to leave the European Union 
(Brexit) have raised questions 
about the influence of fake online 
news and social-media ‘echo 
chambers (see also P. Williamson 
Nature 540, 171; 2016). The 
propagation of such information 
through social networks bears 
many similarities to the evolution 
and transmission of infectious 
diseases. Analysis of transmission 
dynamics could therefore provide 
insight into how misinformation 
spreads and competes online. 

For example, disease strains 
can evolve and compete ina host 
population, much like rumours, 
and infections and opinions are 
both shaped by social contacts. 
Modelling of competing disease 
strains indicates that, as contacts 
become more localized, the 
diversity of circulating strains 
can increase (see C. O’ E Buckee 
et al. Proc. Natl Acad. Sci. 

USA 101, 10839-10844; 
2004). Network structure can 
also suppress the invasion 

of new disease strains (see 

G. E. Leventhal et al. Nature 
Commun. 6, 6101; 2015). 

As more people turn to social 
networks as a primary news 
source, transmission models 
combined with appropriate 
data could help in exploring 
the dynamics of this new media 
landscape. 

Adam Kucharski London School 
of Hygiene & Tropical Medicine, 
UK. 
adam.kucharski@lshtm.ac.uk 


Be wary of ‘ethical’ 
artificial intelligence 


Jim Davies's suggestion that we 
programme ethics into artificial 
intelligence meta-systems as a 
safeguard could well backfire — 
by compromising our abilities 
to judge ethical implications 
(Nature 538, 291; 2016). 

In an earlier version of the 
future, robot lawnmowers and 


kitchen appliances promised 

us more leisure time. We 

now face the spectre of mass 
human displacement from a 
consumption-based economy 
by equipment that can do things 
much more efficiently than 
people can. 

The ‘age of information 
promised global connectivity, 
but this has wrought distraction 
to a point at which only lurid 
excesses can focus our undivided 
attention on the society to which 
we all belong. 

And as computer-generated 
imagery colonizes our 
imaginations, many are barely 
swayed by real violence (the 
wanton destruction of Syrian 
cities comes to mind). There is 
even evidence that video gaming 
driven by computer-generated 
imagery can alter a player's 
perception of acceleration 
and gravity (see, for example, 

A. B. Ortiz de Gortari and 

M. D. Griffiths Int. J. Hum. 
Comput. Interact. 30, 95-105; 
2014) — compromising their 
decision-making skills in a world 
where real physics is the law. Such 
trends don’t bode well for ‘ethical’ 
computers. 

Michael Stocker Ocean 
Conservation Research, 
Lagunitas, California, USA. 
mstocker@ocr.org 


New contender for 
most lethal animal 


Of some 3,500 species of 
mosquito, those of the genus 
Anopheles are widely considered 
to be the most dangerous because 
they transmit malaria. Malaria is 
decreasing, however, and other 
mosquito-borne diseases, such 
as dengue, chikungunya, Zika 
and yellow fever, are increasing 
(S. V. Mayer et al. Acta Tropica 
166, 155-163; 2017). The 
mosquito Aedes aegypti, the 
primary carrier of these viruses, 
now constitutes an even greater 
threat (see also Nature 539, 
17-18; 2016, and S. E Dowell 
et al. Nature 540, 189-191; 2016). 
Aedes mosquitoes have been 


transferring these viruses among 
African primates for millennia. 
One African primate (Homo 
sapiens) and one African Aedes 
(A. aegypti) have spread from 
Africa. Viruses adapted to both 
have spread with them. Yellow 
fever hit the developed world in 
the seventeenth century, dengue 
in the nineteenth, chikungunya 
in the twentieth, and now Zika in 
the twenty-first. 

Scientists studying mosquito- 
borne viruses have catalogued 
hundreds more lurking in Africa. 
The world needs to take notice 
before these take hold and 
spread further. As Pliny the Elder 
(AD 23-79) wrote, “Ex Africa 
semper aliquid novi” (‘there is 
always something new coming 
out of Africa’). 

Jeffrey R. Powell Yale University, 
New Haven, Connecticut, USA. 
jeffrey.powell@yale.edu 


Journals, agree on 
manuscript format 


An ‘incorrectly’ formatted 
manuscript submission risks 
immediate bounceback by 
the authors’ chosen journal, 
irrespective of the value of its 
content. In my view, it would 
save time and frustration if the 
scientific community could agree 
ona uniform style for all journals. 
There is no inherent advantage 
in customized formatting of 
references, for example, whether 
cited as E. R. Smith, P. Y. Young 
and G. T. Jones J. Interest. Sci. 
2016, 85, 6700-6782, or as Smith, 
FR, Young, PY, Jones, GT (2016) 
J. Interest. Sci. 85: 6700-6782, or 
using other arbitrary variants in 
style and positioning of initials, 
year of publication and page span. 
Research papers in the natural 
sciences are typically presented 
under the headings Abstract, 
Introduction, Methodology, 
Results, Discussion, Conclusions. 
Some journals do not use 
Abstract or Introduction 
headings; some put the 
Methodology section after the 
rest. No journal so far puts the 
title at the end of the paper. 


Journals presumably insist 
on individual formatting styles 
as a distinguishing feature. 
Isee no scientific merit in 
doing so. Cosmetic treatments 
should instead be reserved 
for enhancing the clarity ofa 
manuscript’s content. 
Quanmin Guo University of 
Birmingham, UK. 
q.guo@bham.ac.uk 


Vulture restaurants 
cheat ecosystems 


Vulture ‘restaurants across 
southern Europe are serving 
up carcasses in an attempt to 
rescue these endangered birds. 
We contend that such outlets 
are no true replacement for the 
naturally random food pulses 
associated with wildlife carrion. 
When food is only randomly 
available, it supports the foraging 
of hundreds of invertebrate and 
vertebrate scavenger species, 
promoting co-existence of 
multiple species and thus 
ecosystem balance. By contrast, a 
predictable food supply tends to 
benefit only selected species (see, 
for example, A. Cortés-Avizanda 
et al. Front. Ecol. Environ. 14, 
191-199; 2016) — griffon 
vultures (Gyps fulvus) in this case. 
Furthermore, large predators 
are now expanding in Europe, 
contributing to the rewilding 
of many landscapes. Predation 
helps to stabilize food chains 
by buffering oscillations in the 
availability of carrion, which 
are linked to climate events and 
disease epidemics. Scavenger 
species that recover as a result 
of such ecological rewilding 
boost ecosystem functions and 
services derived from their 
feeding behaviour, for example 
by reducing the spread of disease. 
We should stop providing 
services to nature through 
vulture restaurants and allow 
nature to provide services to us. 
Ainara Cortés-Avizanda, 
Henrique Miguel Pereira 
CIBIO-InBIO, University of 
Porto, Vairdo, Portugal. 
hpereira@idiv.de 
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For News & Views online, go to 
nature.com/newsandviews 


Pathways of parallel progression 


Two studies in mice identify mechanisms by which tumour cells disseminate in very early breast cancer. Both show that 
these cells colonize distant tissues more efficiently than their later counterparts. SEE ARTICLE P.552 & LETTER P.588 


CYRUS M. GHAJAR & MINA J. BISSELL 


he conventional linear model of cancer 

progression states that the cells of a 

developing tumour gradually pick up 
genetic mutations, with cells that accumu- 
late optimal variants eventually acquiring the 
ability to migrate to and colonize other tissues 
in the body as metastases. This theory has a 
huge influence on current views of personal- 
ized medicine — for evidence, look no further 
than the US Cancer Moonshot initiative’, 
which recommends extensive sequencing of 
primary tumours to predict and understand 
treatment resistance. This represents a reason- 
able approach only if the disseminated cancer 
cells (DCCs) that form metastases are derived 
from cells that populate the primary tumour 
around the time of its detection. But several 
lines of evidence*” indicate that tumour cells 
can leave the primary site very early during 
tumour progression and evolve indepen- 
dently at the metastatic site. Online in Nature, 
two papers'”"’ shed light on the mechanisms 
of early dissemination for the first time. Strik- 
ingly, they offer the first evidence that these 
early DCCs are more metastasis-competent 
than cells that leave the tumour at later stages. 


Early tumour 


Migration 


The first compelling data to call the linear- 
progression model into question came from 
studies of human breast cancer. Genetic analysis 
of primary breast tumours and correspond- 
ing DCCs showed that, at the time of tumour 
detection, DCCs had fewer genetic alterations 
than primary cells, implying that DCCs seed 
the bone marrow early in disease progression, 
and evolve separately~. This theory of parallel 
progression’ was supported by the revelation 
that 20-30% of patients classified as having 
‘non-invasive’ breast cancer have DCCs in 
their bone marrow””. Because up to 8% of ‘non- 
invasive breast cancers recur at distant sites’”, 
it was assumed that at least some of these early 
DCCs had metastasis-initiating potential. 

Animal models of breast cancer cast further 
doubt on the linearity of metastasis. In a mouse 
model in which breast cancer is driven by 
overexpression of the gene Her2, DCCs were 
detectable in the bone marrow by four weeks 
of age — just after Her2 expression begins’. 
Cancerous alterations in the mammary gland 
are detectable only by electron microscopy at 
this stage, and palpable primary tumours do 
not develop for another 14 weeks’. 

Despite these data, the molecular mecha- 
nisms that underlie the early dissemination 


BOneime tra c Late tumour 


Early DCCs 


of tumour cells have not been identified. To 
address this, Hosseini et al.’° and Harper et al." 
returned to the Her2-driven breast-cancer 
mouse model. 

Hosseini et al. (page 552) analysed gene 
expression in epithelial cells that line the ducts 
of the mammary gland, isolated before the 
mice reached nine weeks of age, when tumours 
are not yet palpable. Their analysis suggested 
that the hormone progesterone drives dissemi- 
nation from these microscopic early tumours. 
The authors showed that progesterone triggers 
secretion of the proteins WNT4 and RANKL 
from cells expressing the progesterone receptor 
(PGR), and that these signals imbue epithelial 
cells that do not express PGR with increased 
migratory potential (Fig. la). 

The effect of progesterone becomes less 
apparent as tumour development progresses. 
The authors found that both higher cellu- 
lar density and the increased HER2 protein 
levels driven by this change induce expression 
of microRNA molecules that inhibit the gene 
encoding PGR. In effect, a dissemination-to- 
proliferation switch occurs when a developing 
tumour region becomes crowded enough. 

Harper et al." (page 588) identified an 
invasive cell population in microscopic 


Growth 


Early 
DCC growth 


Metastasis 


Figure 1 | Mechanisms of early dissemination and metastasis in breast 
cancer. Two papers outline molecular mechanisms by which cancerous 
epithelial cells in the mammary glands of mice, driven by overexpression of 
the gene Her2, disseminate from the gland to future sites of metastasis 
during very early stages of tumour development. a, Hosseini et al."° find that 
WNT4 and RANKL proteins are secreted from cells in very early tumours 
that express the progesterone receptor (PGR) protein (PGR’ cells). Nearby 
cells that do not express PGR (PGR ) become migratory owing to the 
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secreted signals and invade the bone marrow as early disseminated cancer 
cells (DCCs). b, Harper et al."' report that increases in the WNT signalling 
pathway lead to inhibition of the protein p38 and transition of early-tumour 
cells to an invasive DCC state. c, Cells from late-stage tumours also migrate 
to the bone marrow. Both studies provide evidence that cells that disseminate 
later in primary-tumour progression form metastases in the bone marrow 
less efficiently than early DCCs, implying that early DCCs might be a source 
of metastases. 
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tumours that is characterized by Her2-driven 
expression of WNT proteins. WNTs counter- 
act the activity of the enzyme p38, which is 
typically expressed by non-proliferative (dor- 
mant) DCCs"*. Downregulation of p38 leads 
toa decrease in the epithelial cell-cell junction 
molecule E-cadherin and to upregulation of 
several other genes involved in epithelial-to- 
mesenchymal transition — a change in cell 
state that facilitates invasion of mammary cells 
into the bloodstream, and ultimately into other 
tissues as early DCCs (Fig. 1b). Inhibition 
of p38 significantly increases the number of 
circulating tumour cells, as well as the number 
of DCCs in both the lung and bone marrow, 
confirming this protein's role in dissemination. 

The authors found that DCCs regain some 
of their epithelial characteristics when they 
reach the lung and bone marrow. However, it 
is not clear whether dormancy-associated p38 
signalling is restored when the cells enter these 
sites. Nor is it known whether these early DCCs 
are less responsive to quiescence-inducing 
cues from their new microenvironments™ 
than cells that leave the primary tumour at 
later stages. 

Are early DCCs less likely to become — or 
to stay — dormant when they reach distant 
sites? Perhaps so. Adopting different experi- 
mental strategies, both groups demonstrated 
that early DCCs are inferior to late DCCs 
in their ability to form primary tumours 
if implanted in another mammary gland. 
However, early DCCs are substantially more 
metastasis-competent, forming metastases 
faster and more prolifically than their later 
counterparts (Fig. 1c). 

The molecular mechanisms that drive 
dissemination from early mammary lesions 
identified in these studies might not apply 
across all subtypes of breast cancer or to other 
cancers. Nonetheless, the findings provide 
a general framework within which to study 
causality between early DCCs and metasta- 
sis — particularly for cancers in which early 
dissemination is a documented phenomenon, 
such as skin® and pancreatic® cancers. 

These studies have major implications with 
regard to preventive therapies. First, given the 
microscopic stage of primary tumour forma- 
tion at which dissemination of metastasis- 
competent cells occurs, developing means for 
early tumour detection may not be sufficient 
to prevent metastatic disease. And, because 
cells have probably disseminated by the time 
a primary tumour is detected, targeting the 
mechanisms of early dissemination identified 
in these studies may not be a viable therapeutic 
strategy either. Thus, we should aim to target 
the characteristic properties of early DCCs 
— their long-term survival and therapeutic 
resistance”. 

As Hosseini et al. confirmed from their 
analysis of human tissue samples, early DCCs 
differ substantially from the primary tumour 
at the molecular level. Thus, to learn about 


early DCCs, we must increase the frequency 
with which DCCs are isolated from humans, 
profiled, and studied functionally in appro- 
priate animal and culture models. This will 
aid the identification of elusive molecular 
targets for precision metastasis-prevention 
therapies based on demonstrable steps in 
cancer progression, instead of on an assumption 
of linearity. m 


Cyrus M. Ghajar is in the Public Health 
Sciences Division's Translational Research 
Program and the Human Biology Division, 
Fred Hutchinson Cancer Research Center, 
Seattle, Washington 98109, USA. 

Mina J. Bissell is in the Biological Systems 
and Engineering Division, Lawrence Berkeley 
National Laboratory, Berkeley, California 
94720, USA. 

e-mails: cghajar@fredhutch.org; 
mibissell@lbl.gov 
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Unexplored territory 
for self-assembly 


Cage-like structures can self-assemble from suitable metal ions and organic 
linkers, but the size of the assemblies was limited. The surprise discovery ofa 
new series of cages opens up fresh horizons for self-assembly. SEE LETTER P.563 


FLORIAN BEUERLE 


ature has inspired an efficient way 

to synthesize large molecular aggre- 

gates: self-assembly. In the natural 
process, multiple copies of subunits (proteins, 
for example) spontaneously agglomerate into 
complex, hierarchical architectures such as 
virus shells (capsids). On page 563, Fujita et al.' 
report their use of self-assembly to prepare the 
largest known, artificially synthesized cage- 
type object that has a precise atomic composi- 
tion — an almost spherical shell that assembles 
from 144 components. 

Construction in the macroscopic world is 
usually associated with building sites, where 
many builders simultaneously assemble bricks 
to give shape to an architect’s design. But 
although synthetic chemists can be thought 
of as molecular architects”, construction of 
molecules at the nanometre scale works quite 
differently. Conventional synthetic protocols 
allow target compounds to be constructed 
only through a linear sequence of steps, and 
the intermediates formed after each step must 
be purified in a time-consuming and yield- 
reducing way. This limits both the size and 
complexity of target molecules, so that the 
most complicated structures synthesized so 


far consist of no more than several hundred 
atoms and are only a few nanometres in length 
(see ref. 3, for example). 

Chemists have therefore used self-assembly 
extensively to make molecular superstructures 
on different length scales and of diverse shapes 
and structures. In particular, metallosupra- 
molecular chemistry involves the self-assem- 
bly of bifunctional organic ligands such as 
bipyridyl molecules (which possess two bind- 
ing sites for metal ions) with ions of metals 
such as palladium (Pd”*). The structures that 
emerge can be polyhedra, in which the metal 
ions act as vertices that are connected by edges 
formed by the organic ligands. 

If individual ligands in these compounds 
can be exchanged easily for other ligands, 
then the resulting systems can rapidly adjust 
and rearrange in solution to reach the energeti- 
cally most stable structure as the sole or major 
product. Under such dynamic conditions, the 
structure and topology of the final assembly 
are mainly governed by three factors: the pref- 
erential formation of closed-shell structures 
that maximize the number of metal-occupied 
binding sites’, rather than polymeric products; 
the second law of thermodynamics, which 
maximizes entropy by favouring the formation 
of many small cages at the expense of larger 
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50 Years Ago 


In the course of two recent 
geophysical traverses across the 
Gulf of Guinea, H.M.S. Hecla has 
discovered a number of marked 
elevations of the ocean bed 
approximately on a line between 
St. Helena and the islands of the 
Bight of Biafra ... The continuous 
graphical traces made by the 
precision depth recorder during 
the traverses indicate that these 
are rugged topographical features, 
trending north-east and south- 
west ... Measured from the abyssal 
depths from which they rise 
(2,600 fathoms in the north-west 
to 3,100 fathoms in the south-east) 
these features have an elevation 
exceeding that of the Alps ... The 
seamounts are near and may form 
part of the submerged feature, shown 
on certain American bathymetric 
charts as “The Guinea Ridge”. 
From Nature 24 December 1966 


100 Years Ago 


Soon after the outbreak of the war, 
my father, Lord Roberts, asked the 
public to lend their glasses for the 
use of the Army. After two years 

I think your readers may be glad 
to have some particulars of the 
result of his request ... Upwards 

of 26,000 glasses have been 
received ... The instruments sent 
comprise every type, and have been 
classified and issued according 

to the needs of different units. 
Particularly useful have been the 
fine prismatic glasses sent, which 
have been allocated to artillery 
and machine-gun units, according 
to their power; large mounted 
telescopes for batteries, deer- 
stalking telescopes for gunners and 
snipers, and good old-fashioned 
non-prismatic racing glasses for 
detection of the nationality of 
aircraft, locating snipers signalling 
by disc, collecting wounded, and 
musketry instruction. 

From Nature 21 December 1916 
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Figure 1 | Self-assembling molecular polyhedra. a, Palladium(1) ions (Pd**) and bipyridyl ligands (L) 
were predicted’ to self-assemble into five possible polyhedra that have the general formula Pd,L,,, where 
ncan be 6, 12, 24, 30 or 60. The first four of these have been prepared®” in the past two decades. b, While 
trying to prepare the final member of the series shown in a, Fujita et al.' isolated a structure of formula 
Pd5,L¢, which had an unprecedented topology. They identified this as part of a new infinite series of 
polyhedra (only five members are shown), and then isolated the next member of this series, Pd,gLo,. 
This achievement opens up research to identify other assemblies of increasing size and complexity. 


assemblies; and the preferential formation of 
‘jsotropic structures that have indistinguish- 
able subcomponents to minimize surface 
energy and distribute local strain equally 
throughout the assembly. It therefore follows 
that the most favourable structures are highly 
symmetrical objects in the shape of Platonic 
solids (regular convex polyhedra such as 
cubes or octahedra) or Archimedean solids 
(semiregular polyhedra composed of differ- 
ent regular polygons that converge at identical 
vertices). 

Additional design constraints also apply, 
such as the need for palladium ions to bind to 
ligands in a square-planar geometry — which 
implies the convergence of exactly four edges 
at each vertex. Taken together, the constraints 
on the self-assembly of palladium ions with 
bifunctional ligands reduce the number of 
potential target structures to five cages of for- 
mula Pd,,L,,,, in which n can be 6, 12, 24, 30 or 
60, and L is the ligand’ (Fig. 1a). 

Over the past two decades, Makoto Fujita’s 
research group has pioneered self-assembly by 
synthesizing representatives of the first four 
members of the series: the Pd,L,, octahedron’; 
the Pd,,L,, cuboctahedron’; the Pd,,L4, 
rhombicuboctahedron’*; and the Pd .Lgg 
icosidodecahedron’. The type of structure 
that forms depends on the ligand design. For 
example, subtle changes in the angle formed 
between the two pyridyl units in a bipyridyl 
ligand can induce Pd,,L,, stoichiometry to 
form, rather than Pd,,L,, (ref. 8). 

In the present work, Fujita et al. targeted the 
elusive Pd,ol,.) rhombicosidodecahedron, the 
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last representative of the series. However, the 
authors serendipitously discovered the for- 
mation of a Pd3)L,. cage whose single-crystal 
X-ray structure clearly differed from the previ- 
ously reported one’, and the topology of which 
did not correspond to any of the Platonic and 
Archimedean solids. The authors therefore 
postulated the existence of a new series of poly- 
hedra, which had been reported as theoretical 
possibilities by mathematicians” but never 
observed in any natural or artificial assem- 
blies: closed-shell frameworks in which eight 
equally distributed triangles are incorporated 
into a system of squares (Fig. 1b). These struc- 
tures are reminiscent of Goldberg polyhedra, 
which consist of 12 pentagons connected by 
hexagons, and which are ubiquitous in natu- 
ral and biological systems — such as fullerene 
structures and virus capsids. 

Impressively, the authors also predicted 
that the next homologue of the series, Pd,,L.,; 
would be more stable than the isolated com- 
pound, and were able to manually separate 
individual crystals of this larger assembly 
from the products of their reaction. This cage 
is by far the most complex molecular struc- 
ture of precise atomic composition to have 
been synthesized until now, and is constructed 
from 144 components through 192 individual 
metal-ligand interactions. 

What is the largest cage structure that could 
be self-assembled? The extended Goldberg 
series of polyhedra provides an indefinite 
number of ever-greater structures, so in 
principle there is no intrinsic limit to size. 
However, it will be increasingly difficult to 


overcome the entropic penalties associated 
with the self-assembly of large cages, and to 
avoid the unwanted but faster formation of 
smaller cages. 

Are there likely to be any practical applica- 
tions for these giant cages? Investigations into 
the chemistry and properties of such assem- 
blies might be severely hampered by the dif- 
ficulties in synthesizing them, especially in 
bulk quantities rather than just as individual 
crystals. The structural integrity of the cages, 
both in solution and in the solid state, is also 
an unknown crucial issue for any applica- 
tions. Nevertheless, these huge metal-organic 
assemblies might encapsulate giant biomol- 
ecules such as proteins by forming host-guest 
interactions, thus stabilizing the biomolecules 


CARDIOVASCULAR DISEASE 


and potentially allowing control of their 
structures in unnatural conditions. 

Apart from their value as benchmarks for 
artificial self-assembly processes, Fujita and 
co-workers’ structures might also inspire inter- 
est from other scientific areas. For instance, 
mathematicians could seek more-exotic topol- 
ogies as targets for self-assembly, and biologists 
might search for previously unsuspected topol- 
ogies in virus capsids or other large biological 
assemblies. And only time will tell whether 
Fujita and colleagues’ synthetic masterpiece 
will be the starting point for further journeys 
into yet-unexplored chemical territory. m 


Florian Beuerle is at the Institute for Organic 
Chemistry and in the Center for Nanosystems 


A turbulent path 
to plaque formation 


Plaque deposits often occur in curved arterial regions with turbulent blood flow. 
Endothelial cells have been found to respond to blood flow through a previously 
unidentified signalling pathway that affects plaque build-up. SEE LETTER P.579 


VEDANTA MEHTA & ELLIE TZIMA 


key characteristic of the disease 
A tecisess is the gradual accu- 

mulation of plaque deposits on the 
walls of arteries. Plaque is composed of cel- 
lular waste, fatty deposits and cholesterol 
molecules, and is not uniformly distributed 
in arteries’. Some plaques can reach a size 
that obstructs blood flow to organs, causing 
heart attacks or strokes’. On page 579, Wang 
et al.’ propose a mechanism for plaque devel- 
opment that also provides an explanation for 
plaque-formation patterns. 

Blood-flow dynamics have a central role 
in atherosclerosis development, and the key 
driving force is shear stress*: the frictional 
force exerted on blood-vessel walls because of 
blood flow. Shear stress as a result of the uni- 
form laminar blood flow that occurs in straight 
regions of blood vessels is not considered to be 
arisk factor for plaque formation*. However, 
curved blood-vessel regions, including those 
near branch points, have disturbed (turbulent) 
blood-flow patterns and are more susceptible 
to plaque development’. 

How do differences in the mechanical 
forces exerted on blood vessels result in the 
promotion or inhibition of plaque forma- 
tion? Endothelial cells line blood-vessel walls 
and can sense and distinguish laminar and 
disturbed blood-flow patterns, which results 
in changes to endothelial signalling pathways 


that ultimately determine whether plaque 
formation is promoted or inhibited*. 

YAP and TAZ proteins act as cellular 
sensors or checkpoints for mechani- 
cal forces’. These proteins are also master 
regulators in the Hippo-protein-mediated 
signalling pathway, which controls organ size 


Laminar 
blood flow 


Arterial 
blood vessel 


Endothelial 
cell 
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and has a tumour-suppressor function’. In 
atherosclerotic arteries, two of the genes 
transcribed through the actions of YAP and 
TAZ are highly expressed”; however, direct 
evidence that links YAP and TAZ to the sens- 
ing of mechanical force by endothelial cells 
and to the development of atherosclerosis has 
been lacking. 

Activation of the YAP and TAZ pathway can 
be measured by assaying phosphorylation of 
YAP, movement of the proteins into the nucleus 
or the expression of their target genes. Using all 
three assays, Wang et al. observed the inhibi- 
tion of YAP and TAZ activity when endothelial 
cells grown in vitro were subjected to uniform, 
laminar shear stress. By contrast, YAP and TAZ 
activity was high when these cells were exposed 
to disturbed shear stress (Fig. 1). 

Wang and colleagues confirmed that YAP 
and TAZ activity is regulated by blood flow, 
using an in vivo system in which the abdominal 


Disturbed 
blood flow 


Figure 1 | Endothelial-cell signalling can affect plaque formation. Laminar blood flow parallel to 

the blood-vessel wall usually occurs in straight regions of arteries. Disturbed, non-laminar blood flow 
occurs in curved arterial regions, including those near where a vessel branches. The endothelial cells that 
line blood vessels can sense and respond to these two different types of blood flow’. Regions of disturbed 
blood flow are associated with deposits of plaque, an accumulation of cellular waste and fatty molecules 
that can obstruct blood flow and potentially cause disease. Wang et al.’ report that, in experiments using 
mouse models and human tissue, endothelial cells adjacent to disturbed blood flow had high YAP and 
TAZ activity and increased plaque formation. By contrast, endothelial cells adjacent to laminar blood 
flow have low YAP and TAZ activity and do not have plaque deposits. 
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artery in rats is clamped. This constriction 
generates regions of uniform and disturbed 
shear stress in the same blood vessel’. High 
YAP activity was observed where blood flow 
was disturbed, and low YAP activity was 
seen in a region subjected to high uniform 
shear stress. 

To determine whether their findings were 
relevant to atherosclerosis in vivo, Wang and 
colleagues used a mouse model of atheroscle- 
rosis. These animals lack a protein that affects 
cholesterol metabolism and are susceptible to 
plaque formation when fed a high-fat diet’®. 
The mice that developed atherosclerotic 
plaques had high YAP and TAZ activity in their 
arteries. The authors also examined samples of 
human atherosclerotic blood vessels and saw 
similar high YAP and TAZ activity. 

The authors then tested whether manipulat- 
ing the level of YAP affects plaque formation. 
Using a version of the atherosclerotic mouse 
model in which YAP was overexpressed in 
endothelial cells, they observed that, after 
four weeks on a high-fat diet, the mice had 
significantly increased plaque formation as 
compared with control animals. 

To determine the role of YAP in atheroscle- 
rosis mediated by disturbed blood flow, the 
authors subjected atherosclerotic mice’ to 
disturbed shear stress through surgery to the 
carotid artery’’. In this system, mice that had 
also been genetically engineered to have lower 
endothelial YAP expression had significantly 
less atherosclerosis than did control animals. 
Lower atherosclerosis, as compared with con- 
trol mice, was also seen if the gene expression 
of TAZ rather than YAP was decreased. 

Wang et al. propose that the laminar-shear- 
stress pathway that inhibits YAP and TAZ 
comprises several molecules that participate 
in the process of mechanotransduction — the 
mechanism by which cells convert mechani- 
cal signals into biochemical responses. The 
authors found that laminar shear stress 
promotes the activation of integrin proteins, 
promotes the interaction between integrin 
£, and the Ga,, protein and inhibits the pro- 
tein RhoA, and that these signalling changes 
subsequently lead to YAP inactivation. Integ- 
rin B, also has a plaque-promoting role’**, but 
how this relates to the plaque-inhibiting role 
identified by Wang and colleagues is unknown 
and needs to be investigated. 

To explore the plaque-promoting signal- 
ling pathways associated with YAP and TAZ 
activation, Wang and colleagues conducted 
cellular analyses, including the analysis of 
messenger RNA sequences. This revealed that 
YAP and TAZ promote the activation of several 
inflammatory pathways, including the ather- 
osclerosis-promoting JNK-protein pathway. 
It is well established that atherosclerosis is a 
multifactorial disease in which inflammation 
has a crucial role. 

Drugs that lower cholesterol to prevent 
plaque formation are the most commonly 


prescribed medicines in Western countries, 
and are a first-line therapy for people who 
have cardiovascular disease. Cholesterol-low- 
ering statin drugs regulate the YAP and TAZ 
pathway'*'®. Whether these drugs protect 
against plaque formation through modulation 
of the YAP and TAZ pathway was unknown. 

Wang and colleagues treated human cells 
that express constantly active YAP and TAZ 
in vitro with the statin simvastatin. They 
found that the treatment did not suppress 
YAP/TAZ-dependent expression of key genes 
that promote inflammation and atheroscle- 
rosis, indicating that the anti-inflammatory 
and anti-plaque effects of statins are probably 
mediated by inhibition of YAP and TAZ activ- 
ity. This indicates that the YAP and TAZ path- 
way could be considered as a treatment target 
for atherosclerosis. 

Atherosclerosis is a complex disease in 
which associated inflammation probably 
occurs through several different pathways, 
and this complexity presents an obstacle to 
successful clinical treatment. Is it possible to 
target YAP and TAZ specifically in endothe- 
lial cells in the arteries? Is suppression of just 
YAP and TAZ, or the molecules within the 
YAP and TAZ signalling pathway, sufficient to 
ameliorate atherosclerosis? Does targeting the 
pathway also affect the tumour-suppressing 
function of the Hippo pathway? These ques- 
tions need to be answered before a therapy to 


prevent atherosclerosis can be devised on the 
basis of YAP and TAZ inhibition. = 
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Sharks shift their 
Spine into high gear 


It emerges that a dogfish shark’s spine becomes stiffer as the fish swims faster, 
enabling the animal to swim efficiently at different speeds. The finding could also 
provide inspiration for the design of robotic biomaterials. 


MATTHEW A. KOLMANN & ADAM P. SUMMERS 


he languorous undulation of a shark 

cruising along a reef gives little hint of 

the fish’s potential to unleash a burst of 
high-speed movement when pursuing prey. 
Writing in the Journal of Experimental Biology, 
Porter et al.' reveal how the structural prop- 
erties of the non-bony, cartilaginous skeleton 
of the spiny dogfish shark (Squalus acanthias) 
allow this fish to shift seamlessly between low- 
speed cruising and high-speed swimming. 

A basic principle of aquatic locomotion is 
that swift swimming requires a stiff spine’. 
A stiffer body decreases drag and increases 
energy efficiency (Fig. 1). By contrast, accel- 
eration requires a flexible spine to allow a fish 
to uncurl its body in a sudden rush’. It was 
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proposed’ that thick fibres in a shark’s skin 
increase the stiffness of the fish as it swims 
faster. This hypothesis is attractive, but has 
resisted experimental verification because of 
the difficulty of getting sharks to swim fast 
in a laboratory while attached to high-tech 
instrumentation. 

Porter et al. investigated shark propulsion 
from a different direction. Rather than con- 
sidering the fish’s exterior, they went right to 
the core of the matter: the vertebral column, 
or spine. In bony animals, the vertebrae of the 
spine are mineralized, rigid, bony structures 
that do not change shape appreciably during 
locomotion. Movement of the spine in bony 
animals occurs through changes in the shape 
of the intervertebral disks — elastic, but quite 
firm structures located between individual 
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Figure 1 | Shark spine stiffness and swimming efficiency. a, Water eddies 
(blue swirls) around a swimming fish creates drag that slows the 

animal down. Arrows indicate water moving away from the fish’s direction 
of motion and contributing to drag. Ifa fish has a flexible spine, this 

creates an inefficient swimming posture for swift movement because it 
generates high drag. However, flexible spine movement can be useful for 


vertebrae. These disks consist of an outer 
ring of fibrous connective tissue encasing 
a gelatinous core. By contrast, sharks have a 
mainly cartilaginous skeleton: their vertebrae, 
as well as their intervertebral disks, are made 
of cartilage, and the vertebrae are so poorly 
mineralized that they are potential sites of 
deformation. 

In previous in vivo and in vitro work by 
researchers from the same laboratory’, 
sharks had tiny piezo-crystal-based monitors 
implanted on their vertebrae. These sensors emit 
and detect ultrasound, allowing inter-crystal 
distances to be measured with micrometre- 
scale accuracy. The researchers showed that 
the vertebrae themselves deformed, not just 
the intervertebral disks. This earlier work 
showed that the shark vertebral column could 
not be simplified as a series of soft connectors 
between rigid blocks, and instead is a struc- 
ture that can be variably deformed across its 
entire length. 

To investigate changes to the shark spine 
during swimming, Porter et al. bent excised 
vertebral columns in the same wave-like 
motions that are observed during different 
swimming speeds. The authors used these 
in vitro experiments, together with computer 
modelling, to determine the length changes 
that occur across a spinal segment (a series of 
ten vertebrae and nine intervertebral disks) 
when the dogfish shark swims. They recorded 
the corresponding spinal deformations that 
occurred during movements ranging from the 
lazy tail wag of a cruising shark to the all-out 
burst that precedes prey capture. The overall 
result was a fine-scale understanding of the 
relationship between spinal deformation and 
swimming speed. 

Asa shark swims, the energy used to bend the 
spine is stored, and is then released when 
the spine straightens, providing energy for 
forward motion. Porter and colleagues’ work 
demonstrates that, in swimming sharks, 
deformation of the vertebrae as well as the 


fish’s speed. 


intervertebral disks contributes to the stored 
energy. The authors also observed that, as 
sharks swim faster, their spine gets stiffer. The 
bending ofa stiffer spine increases the stored 
energy that can be used to drive forward 
motion, and allows the shark to swim faster 
with greater efficiency. This is an aquatic 
equivalent of continuously variable trans- 
mission, a type of gear-change system found 
in some motor scooters that is continuously 
responsive to a wide range of speeds. 

What is the structural basis of the shark's 
remarkable varying spinal properties? Carti- 
lage is a water-laden, fibre-reinforced material. 
It is a type of in-between substance, neither 
an elastic solid such as rubber or metal, nor 
a stirrable fluid such as coffee. Instead, 

cartilage belongs 


“The overallresult ‘© the category 
was afine-scale of viscoelastic 
lerstundi materials — materi- 

th tiene tad als that resist defor- 
between spinal mation differently 
i when they change 
deformation length at different 
and swimming rates. An example 
speed. of a viscoelastic 


material is the toy 
Silly Putty, which can be drawn hair thin when 
pulled slowly, but which breaks apart before 
stretching if it is given a sudden pull. 
Intervertebral disks, being made of the 
gooey composite material cartilage, have 
the viscoelastic property of becoming stiffer 
when suddenly strained. This is why they make 
such poor shock absorbers when someone 
locks their knees straight to jump off a step. 
Nevertheless, these structures can undergo 
gradual compression. Over the course ofa day, 
human intervertebral disks are slowly com- 
pressed under the weight they bear, leaving 
people shorter at night than in the morning’. 
Porter and colleagues’ work shows that, like 
the human version, fish intervertebral disks 
display the viscoelastic property of an increase 


allowing rapid acceleration. b, Ifa fish has a stiffer spine, it encounters 

less drag and can swim efficiently at high speed. Porter et al.' studied the 
properties of the spine of the dogfish shark (Squalus acanthias) at different 
swimming speeds. The authors found that the fish’s spine becomes stiffer as 
the shark swims faster, thus enabling efficient movement depending on the 


in stiffness when the structure is rapidly 
strained. 

Applications for the discoveries made by 
Porter et al. might arise in the world of soft and 
bio-inspired robotics, given that microrobots 
exist that have been designed using the swim- 
ming patterns and anatomy of skates, another 
type of cartilaginous fish, as an inspiration’. 
This discovery of continuously variable trans- 
mission in sharks might enable robotic design- 
ers to construct light, energy-efficient systems 
for motion that would require few moving 
parts and have potentially low levels of wear 
or little need for replacement parts. 

Porter and colleagues’ findings highlight the 
difficulty of characterizing dynamic mechanics 
in composite biological materials, a research 
area that could revolutionize design by intro- 
ducing materials whose physical conforma- 
tion can shift to fit differing roles. In an age of 
climate change and increasing environmental 
pollution, inventors are increasingly looking 
to nature for inspiration when trying to build 
clean and efficient machines. What better ani- 
mals to choose than sharks, given that their 
capacity for movement has been refined over 
more than 420 million years” of evolution? m 
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PLANETARY SCIENCE 


Frozen in darkness 


In 2014, water vapour was detected around 
Ceres, a dwarf planet in the asteroid belt. 
NASA's Dawn spacecraft, in orbit around 
Ceres since March 2015, subsequently found 
water ice on the dwarf planet, in a small, 
mid-latitude crater named Oxo. Now, writing 
in Nature Astronomy, Platz et al. show that 
polar regions of Ceres that lie in perpetual 
darkness trap and preserve water ice, 
providing valuable information about the 
processes that create water reservoirs on 
planetary bodies (T. Platz et a/. Nature Astron. 
http://dx.doi.org/10.1038/s41550-016- 
0007; 2016). 

Permanently shadowed areas on airless 
planetary bodies, usually found in craters at 
polar latitudes, have favourable conditions 
for retaining water ice. Without sunshine, 
these regions are extremely cold (a few 
tens of kelvin), and can therefore efficiently 
trap water molecules and retain them for 
geological timescales. This cold-trapping 
process is known to be active on the Moon 
and on Mercury. 

Platz and colleagues analysed images 
obtained by Dawn’s Framing Camera to 
reconstruct complete maps of the shadowed 
areas of the northern polar region of Ceres 
(pictured). They find more than 600 craters 
in perennial shadow, 10 of which exhibit 
bright features. One of these features has 
a (partially) illuminated portion, allowing 
Platz et al. to perform a spectral analysis 
using Dawn’s imaging spectrometer. The 
spectra show clear signatures of water 
ice, providing definitive evidence for the 
nature of these bright deposits. 

The authors also constrain the timescale 


CONDENSED-MATTER PHYSICS 


of water-ice production on Ceres using the 
fact that ice deposits evolve through two 
competing mechanisms: slow accumulation 
by cold-trapping and destruction by impact 
‘gardening’, the overturning of planetary 
soil by impacts. The authors estimate that 
the deposits are young (not more than a few 
hundred thousand years old), which implies 


Quantum mechanics 


in aspin 


Quantum spin liquids are exotic states of matter first predicted more than 40 years 
ago. An inorganic material has properties consistent with these predictions, 
revealing details about the nature of quantum matter. SEE LETTER P.559 


LEON BALENTS 


he phenomenon of magnetism, 
discovered thousands of years ago, 
arises from the alignment of electron 


magnetic moments known as spins. But if 
these spins do not align, they can form a 


truly quantum state called a quantum spin 
liquid (QSL). On page 559, Shen et al.’ report 
measurements of exotic spin excitations in an 
inorganic material (of ytterbium, magnesium, 
gallium and oxygen; YbMgGaO,). The 
authors’ observations suggest that YoMgGaO, 
forms a QSL that is closely analogous to a state 
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that ice delivery is continuously active. In 
addition, the deposits are relatively scarce 
(compared with, for example, those on 
Mercury), suggesting that Ceres is unable to 
retain much water or didn’t have much to 
begin with. Luca Maltagliati 


This article was published online on 19 December 2016. 


of matter associated with electrons in a metal. 
The appearance of electron-like particles in 
such a material is surprising, and indicates 
extraordinary quantum entanglement’ of the 
underlying spins. 

The concept of a QSL was first introduced” 
in 1973 by the physicist Philip Anderson, 
who described such a system as “resonating”, 
indicating the presence of quantum super- 
position — in quantum mechanics, reality is 
represented by a wavefunction in which physi- 
cal states of a system can be added together like 
numbers in arithmetic. An extreme example 
of quantum superposition is Schrédinger’s 
famous cat, whose wavefunction is the sum of 
the state of the living cat and that of the dead 
cat. The QSL is a close relative of Schrédinger’s 
cat that incorporates long-range entanglement 
(a superposition involving many widely sepa- 
rated spins). According to modern quantum 
theory, this type of entanglement is so stable 


NASA/JPL-CALTECH/UCLA/MPS/DLR/IDA 


a Ordinary magnet 
WWW 

ties = AL AS x ay. 
FOIA IOI 
FOOD 
a 

ier "A 
Hoteke hs 


: 
oy 
See 


b QSL 


e's 
VS : 


Energy 


Momentum 


Momentum 


Figure 1 | Identifying a quantum spin liquid. a, In a neutron-scattering experiment, an ordinary 
magnet can experience an excitation called a magnon (orange arrow) — a spin that is flipped with 
respect to the ordered pattern of the surrounding spins (blue arrows) in the lattice. For a fixed value of 
the momentum, such an excitation has a specific energy. b, Conversely, in a quantum spin liquid (QSL), 
neutron scattering can create two ‘spinons’ (red arrows); the surrounding spins are paired in ‘valence’ 
bonds. The two spinons give rise to a continuous energy spectrum: for a fixed momentum, the energy 
can take a wide range of values. Shen et al.' report the observation of such a spectrum in the inorganic 
compound YbMgGaO,, suggesting that this material could be a QSL. 


that a QSL constitutes a new phase of matter 
at zero kelvin** 

The pattern of entanglement in a QSL 
state can be disrupted locally to form objects 
called quasiparticles that behave like ordinary 
particles. Such objects can have substantially 
different properties from those of the under- 
lying microscopic spins. A dramatic example 
is the appearance of electron-like particles 
called fermions in a QSL — these particles 
obey Pauli’s exclusion principle (they cannot 
share a single quantum state), whereas spins 
do not. Such behaviour enables the formation 
ofa spinon metal, a material that is a conductor 
of spin and heat, but an electrical insulator. 

A spinon metal has been implicated in the 
organic crystals k-(BEDT-TTF),Cu,(CN), and 
EtMe,Sb[Pd(dmit),],, which have been the 
most-studied QSLs experimentally for more 
than a decade’. Here, the presence of a spinon 
metal was deduced indirectly from thermo- 
dynamic and thermal-transport measure- 
ments. The spins in these crystals are arranged 
in an approximately triangular lattice — such a 
geometry prevents a simple alignment of spins 
that could disrupt QSL formation. 

Shen and colleagues study the inorganic 
material YbMgGa0O,, which, like the organic 
crystals, contains a triangular lattice of spins — 
in this case, Yb** ions separated by non- 
magnetic layers® (see Figure 1 of the paper’). 
However, YbMgGaO, is atomically dense 


(there is one spin per 5 square angstroms in 
each layer’; for comparison, the area per spin 
in x-(BEDT-TTF),Cu,(CN), is more than 
100 A?; ref. 8), and each spin carries a roughly 
50% larger magnetic moment than its organic 
counterpart’, These distinctions, and the 
absence of hydrogen, make YbMgGaQ,a per- 
fect candidate for neutron scattering, the gold 
standard of magnetic measurements. 

In inelastic neutron 
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excitations that have 
a given energy and momentum. The form of 
this spectral density is very different in an ordi- 
nary magnet and a QSL (Fig. 1). Ina magnet, 
the excitation, called a magnon, is simply a 
spin that is flipped with respect to the ordered 
pattern of the surrounding spins. Such an 
excitation behaves like a particle and therefore 
has a definite relationship between its energy 
and its momentum. As a result, the spectral 
density at fixed momentum is non-zero only 
at specific energies. By contrast, in a QSL, 
the excitation is two ‘spinons, which share 
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the energy and momentum that are imparted 
to the material. Consequently, the spectral 
density is non-zero for a large range of 
energies — a continuum. 

Shen and collaborators report the observa- 
tion of such a continuum in YbMgGaO,,. The 
spins in this material have been shown® to 
remain unaligned at temperatures as low as 
30 mK — strongly suggesting that this feature 
persists to 0 K — and thermodynamic quan- 
tities such as the specific heat and magnetic 
susceptibility have features quite similar to 
those of the organic crystals. The authors’ data 
reveal a much more detailed view of this QSL 
state, showing a continuum spectrum that is 
devoid of any sharp excitation peaks; it has 
maxima in energy and an overall intensity that 
are momentum dependent. Shen et al. com- 
pare their material's spectral density at 70 mK 
with theoretical predictions for various QSL 
states, finding that the best agreement is with 
a spinon metal. 

The authors determine the nature of 
YbMgGaO, purely empirically — its QSL 
behaviour now needs to be understood theor- 
etically. Microscopically, the forces between 
Yb** spins are direction dependent and some- 
what complicated, and their effects have not 
been definitively determined. Moreover, the 
non-magnetic layers between the triangular 
planes in YbMgGaO, have a random arrange- 
ment of Mg and Ga atoms — this must induce 
some disorder in the spins, but its role in QSL 
formation is unknown. 

Ifthe authors’ interpretation is correct, their 
results constitute the first momentum- and 
energy-dependent spectroscopy of a spinon 
metal. Note that a simultaneously published 
study” has reported consistent experimental 
results for YbMgGaO,, and that a possible 
spinon continuum has been observed in the 
mineral herbertsmithite!°, which has been sug- 
gested to be a different type of QSL. Further 
exploration of spinon metals could reveal rich 
physics — for example, spinons are predicted 
to interact strongly through ‘emergent gauge 
fields, whose effects on the spectral density 
are unknown. It is exciting to imagine these 
beautiful theoretical ideas realized in the 
laboratory. = 
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Extracts from selected News & Views 
articles published this year. 


RESOLUTION BEYOND THE DIFFRACTION LIMIT 
Jian-Ren Shen (Nature 530, 168-169; 2016) 


The resolution at which structures can be visualized using X-ray 
crystallography depends on the diffraction limit, which in turn depends 
on the regular arrangement of molecules within crystals. The degree 
of regularity is often low in crystals of biomacromolecules, creating a 
major barrier to visualizing these molecules at the atomic level. Ayyer 
et al. report an approach to solving molecular structures at resolutions 
higher than the diffraction limit. They propose that molecules in crystals 
can be considered as rigid units, and that disorder is caused by transla- 
tional displacements of these units from their lattice positions. These 
displacements cause scattered X-rays to produce a vague pattern of 
light and shadowy regions, which contains information about the 
molecules’ structure at resolutions beyond the limit of normal Bragg 
diffraction. The authors used this information to extend the resolution 
of a membrane-protein complex from 4.5 A to 3.5 A. The technique 
is potentially a great step forward for those seeking high-resolution 
structural information for many ‘poorly diffracting’ protein crystals. 
Original research: Nature 530, 202-206 (2016). 


AMILESTONE IN QUANTUM COMPUTING 
Stephen D. Bartlett (Nature 536, 35-36; 2016) 


Quantum-savvy entrepreneurs are already bringing the first quantum 
computer processors out of the physics laboratory and onto the market. 
But these devices are mostly designed to perform just one function and 
cannot be programmed to run different algorithms. It would therefore 
be advantageous to build a fully fledged quantum computer that could 
be programmed to run anything we might want. In particular, it might 
execute the complex quantum algorithms that researchers think will 
solve today’s intractable problems in quantum chemistry, materials 
science and data security. Debnath et al. present a small but fully pro- 
grammable quantum computer consisting of five quantum bits (qubits), 
and they demonstrate its functionality by running several simple quan- 
tum algorithms. In all of these demonstrations, the resulting error rate is 
consistent with the authors’ observations of how their qubits work in 
isolation, showing that the qubits can be used together in more-sophisti- 
cated algorithms in the future. The next challenge for such technologies 
is to demonstrate that quantum error correction can bring error rates 
down to negligible levels. 

Original research: Nature 536, 63-66 (2016). 
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EVOLUTION 
INSECT INVASIONS AND NATURAL SELECTION 
Amro Zayed (Nature 539, 500-502; 2016) 


The success of social-insect invaders is paradoxical, because they 
have a sex-determination system that gives rise to many sterile or 
inviable males in small founding populations. Gloag et al. show how, 
during the early stages of an invasion of the Asian honeybee Apis 
cerana, the action of natural selection can lessen this problem. Sex 
determination in bees is governed by a gene called csd. Different 
versions of genes are called alleles. Under conditions in which fer- 
tilized eggs have two identical alleles of csd, fertilized eggs produce 
males that are sterile or inviable. The authors speculated that a form 
of natural selection, called balancing selection, would have a role in 
reducing imbalances in the frequency of csd alleles. With balancing 
selection, individuals that have rare csd alleles would be expected 
to have high fitness. Gloag et al. sequenced the csd gene in the A. 
cerana population to determine how it evolved during the invasion. 
They found that the frequency of the csd alleles started to converge 
on the frequency expected if balancing selection is assumed. 
Original research: Nature Ecol. Evol. http://dx.doi.org/10.1038/S41559- 
016-0011 (2016). 


EVIDENCE OF LIFE IN EARTH'S OLDEST ROCKS 
Abigail C. Allwood (Nature 537, 500-501; 2016) 


When did life first arise on Earth? Nutman et al. analysed 3.7-billion- 
year-old rocks in the Isua Greenstone belt in Greenland. Within the 
rocks can be seen ancient ripple marks and piles of rock fragments 
deposited during an ancient storm. In the middle of it all are structures 
resembling stromatolites: layered structures that form through micro- 
bially influenced accretion of sediment. If these are really the figurative 
tombstones of our earliest ancestors, the implications are staggering. 
Earth's surface 3.7 billion years ago was a tumultuous place. Iflife could 
find a foothold here, and leave such an imprint that vestiges exist, even 
though only a minuscule sliver of rock is all that remains from that 
time, then life is not a fussy, reluctant and unlikely thing. Give life half 
an opportunity and it'll run with it. 

Original research: Nature 537, 535-538 (2016). 
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GENERATION SOFT 


Barbara Mazzolai & 
Virgilio Mattoli ‘ 
(Nature 536, 400-401; 2016) 


Robots are typically used in 

manufacturing contexts that involve well-structured environments. 
But if these machines were moved into ‘real’ environments, they would 
have to cope with uncertain situations and adapt to changing condi- 
tions — tough problems to solve using conventional technology made 
from hard materials. Robots made from soft, deformable materials 
would be better able to grasp and manipulate unknown objects, and 
to move on unstructured and rough terrains. Wehner et al. present the 
first robot that completely lacks rigid structures and control systems. 
The octopus-shaped robot has eight arms moved by a mechanism that 
relies on the expansion of embedded, inflatable components integrated 
into a fluidic-pneumatic network, powered by a liquid fuel. Although 
soft robotics is still in its infancy, it holds great promise for applications 
such as servicing and inspecting machinery, search-and-rescue opera- 
tions, and exploration. 

Original research: Nature 536, 451-455 (2016). 


FORUM: Neuroscience 
VIRTUAL REALITY EXPLORED 


(Nature 533, 324-325; 2016) 


Neuroscientists are increasingly using virtual reality (VR) to facilitate 
studies of animal behaviour, but whether behaviour in the virtual world 
mimics that in real life is a matter for debate. 


THE BEST OF BOTH WORLDS 
Matthias Minderer & Christopher D. Harvey 


VRallows researchers to define explicitly and exhaustively the sensory 
cues that carry information about the virtual world. It offers the means 
to add or remove sensory cues to test the contribution of each one to a 
neural code, and to build up a ‘minimal set of stimuli needed to produce 
a given behaviour or neural activity pattern. A second benefit comes 
from the ability to redefine the laws that link the subject’s actions to 
changes in its world. Third, VR increases the range of tools available to 
measure neural activity. 


AWORLD AWAY FROM REALITY 
Flavio Donato & Edvard I. Moser 


Pressing concerns are raised when VR is used to study higher-order 
computations. Navigation reflects the integration of many sensory 
inputs. But in VR, these elements are often not coordinated, and 
the animal must overcome discrepancies between visual cues that 
follow movements and cues that are static in VR, such as smell. Such 
discrepancies might alter the activity of space-encoding neurons to 
reflect only information coordinated to motion, such as visually changing 
landmarks and accumulated distance, at the expense of other cues. This 
could lead researchers to overestimate the contribution of visual inputs 
to navigation. 
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GRAVITATIONAL WAVES 
DAWN OF ANEW ASTRONOMY 
M. Coleman Miller (Nature 531, 40-42; 2016) 


Albert Einstein discovered that binary stars and other sources 
should generate gravitational waves. Unfortunately, he also found 
that any imaginable source would produce waves so weak that detection 
was inconceivable using the technology of the day. But this inconceivable 
detection has now been reported by Abbott et al. (the LIGO Scientific 
Collaboration and the Virgo Collaboration) in Physical Review Letters. 
The authors describe the detection of the signal GW 150914 from gravi- 
tational waves generated by the merger of two black holes. Astronomers 
previously had only three types of messenger from space beyond our Solar 
System: photons, neutrinos and high-energy cosmic rays. Gravitational 
waves can now be added to this short list. Opening this window will reveal 
astronomical events that had only been hypothesized. The signal has also 
provided the most direct confirmation yet of the existence of event hori- 
zons — the boundaries beyond which nothing can escape a black hole’s 
gravitational field. 
Original research: Phys. Rev. Lett. 116, 061102 (2016). 


GENOMICS 
FROM SEATO SEA 
Susan L. Williams (Nature 530, 290-291; 2016) 


Eelgrass (Zostera marina) isan unlikely model for plant evolution, but is 
auseful one because it has undergone major habitat shifts: it evolved from 
marine algae into a terrestrial flowering plant, then moved back to the sea 
again. Olsen et al. describe the complete genome sequence of eelgrass. 
The sequence reveals that, in moving from calm lakes and ponds to the 
rough, salty ocean, eelgrass lost several key gene groups. For evolution- 
ary biologists, the genome represents a missing piece in the puzzle of 
angiosperm evolution. For marine ecologists, the genome is a powerful 
tool for uncovering the adaptations that allow the plant to thrive in a wide 
range of environmental conditions. This ability to adapt might be the key 
to surviving environmental changes such as ocean acidification, warming 
and freshening that are occurring under global climate change. 

Original research: Nature 530, 331-335 (2016). 
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Redefining the invertebrate RNA 


virosphere 


Mang Shi!?*, Xian-Dan Lin**, Jun-Hua Tian**, Liang-Jun Chen!*, Xiao Chen>*, Ci-Xiu Li’, Xin-Cheng Qin!, Jun Li®, 
Jian-Ping Cao’, John-Sebastian Eden?, Jan Buchmann’, Wen Wang!, Jianguo Xu!, Edward C. Holmes! & Yong-Zhen Zhang! 


Current knowledge of RNA virus biodiversity is both biased and fragmentary, reflecting a focus on culturable or disease- 
causing agents. Here we profile the transcriptomes of over 220 invertebrate species sampled across nine animal phyla 
and report the discovery of 1,445 RNA viruses, including some that are sufficiently divergent to comprise new families. 
The identified viruses fill major gaps in the RNA virus phylogeny and reveal an evolutionary history that is characterized 
by both host switching and co-divergence. The invertebrate virome also reveals remarkable genomic flexibility that 
includes frequent recombination, lateral gene transfer among viruses and hosts, gene gain and loss, and complex genomic 
rearrangements. Together, these data present a view of the RNA virosphere that is more phylogenetically and genomically 
diverse than that depicted in current classification schemes and provide a more solid foundation for studies in virus 


ecology and evolution. 


RNA viruses are likely to exist in every species of cellular life’. Despite 
this ubiquity, much of our knowledge of the biodiversity and evolution 
of RNA viruses, as well as their range of genomic structures, comes 
from those viruses that can be cultured and that act as agents of disease 
in humans or economically important animals and plants. However, 
these only represent a tiny fraction of eukaryotic diversity. This sparse 
sampling is apparent from studies of invertebrate viruses. Although 
invertebrates comprise the vast majority of the Metazoa (animals), 
little is known about the nature of the ‘virosphere’ of these organisms”. 
Metagenomic studies of invertebrate viruses have only recently been 
undertaken but often reveal far greater viral biodiversity than seen 
in vertebrates*°. Arthropods, for example, commonly act as viral 
vectors and studies of arthropod RNA viruses have revealed that changes 
in genome size, structure and segmentation have occurred more fre- 
quently and on a larger scale than previously realized”, with some 
arthropod viruses likely to be ancestors of those that infect vertebrates’, 
However, these studies are of limited scope and there are still substantial 
gaps in our knowledge of RNA virus biodiversity at both the phyloge- 
netic and genomic scales for most invertebrates, a fact that may have 
important implications for our understanding of virus evolution, ecology 
and emergence’. We describe here a large-scale meta-transcriptomic 
survey of diverse invertebrate taxa aimed at revealing the hidden 
diversity of RNA viruses. The data obtained enable us to re-examine 
and re-define the invertebrate virosphere, providing a new perspective 
on the fundamental patterns and processes of viral evolution. 


RNA viruses in invertebrates 

We performed deep transcriptome sequencing on more than 220 
invertebrate species, representing 9 metazoan phyla (Arthropoda, 
Annelida, Sipuncula, Mollusca, Nematoda, Platyhelminthes, Cnidaria, 
Echinodermata, and the Chordata subphylum Tunicata), most of which 
have not previously been screened for viruses (Supplementary Table 1). 
Accordingly, we extracted total RNA from these species and prepared 
87 RNA sequencing (RNA-seq) libraries for Illumina HiSeq sequencing 


(Supplementary Table 1). In total, we generated 6 trillion bases of 
90-100 bp paired-end reads that were assembled de novo for virus 
characterization. 

These transcriptome data allowed us to identify at least 1,445 phylo- 
genetically distinct virus genomes or genome segments that contained 
an RNA-dependent RNA polymerase (RdRp) domain (Supplementary 
Table 2). The majority of these virus genomes have greater than 20-fold 
coverage and are sequenced to their complete or near-complete length. 
Sequence alignments and structural comparisons revealed extensive 
sequence divergence within these newly discovered RdRp domains, 
with most sharing less than 40% amino acid identity with those RNA 
viruses described previously. 

To assess the amount of viral RNA in each library, we removed all 
rRNA reads, including those from the host species, and determined 
the proportion of the remaining sequence data that mapped to viral 
RNA. This revealed that viral RNA comprised from 0.05% to 87% 
of the total RNA sequenced (rRNA excluded) within each library, 
although the very high levels in some cases may reflect degradation 
or inefficient extraction of host RNA (Fig. 1). Each library contains 
1-20 virus species per host species and 1-6 virus species that repre- 
sent more than 0.1% of the total RNA sequenced (rRNA excluded) 
(Fig. 1). Although some libraries contain far higher numbers of virus 
transcripts, most have low levels of RNA and may therefore be asso- 
ciated with other cellular organisms that are present within the host 
(see below). 

These transcriptome data also contain a substantial proportion 
of transcripts that carry divergent reverse transcriptase enzymes, 
potentially derived from retrotransposons (Fig. 1). These can be 
distinguished from RNA viruses by their replicase components, 
the lack of consistently inherited structural proteins and the presence 
of DNA copies. Additionally, although RNAs produced by DNA 
viruses (including bacteriophages) and bacteria were present in the 
transcriptome data, these were generally at lower quantities than RNA 
viruses and will not be discussed further. 


1State Key Laboratory for Infectious Disease Prevention and Control, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, National Institute for Communicable 
Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Changping, 100206 Beijing, China. ?Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles 
Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, the University of Sydney, Sydney, New South Wales 2006, Australia. 3Wenzhou Center for Disease Control 
and Prevention, Wenzhou, 325001 Zhejiang, China. “Wuhan Center for Disease Control and Prevention, Wuhan, 430015 Hubei, China. Guangxi Mangrove Research Center, Beihai, 536000 
Guangxi, China. Systems Biology and Bioinformatics Group, School of Biological Sciences, Faculty of Sciences, University of Hong Kong, Hong Kong, China. ’National Institute of Parasitic 


Diseases, Chinese Center for Disease Control and Prevention, Shanghai, China. 
*These authors contributed equally to this work. 
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Figure 1 | The frequency and 
diversity of viral RNA transcripts 
in invertebrate transcriptomes. 
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Evolution of invertebrate RNA viruses 

To place these newly discovered viruses within the context of known 
viral biodiversity, we collated NCBI reference virus genomes from 
all established families and floating genera of RNA viruses, as well as 
non-reference virus genomes from unclassified taxa. The RdRp is the 
only conserved-sequence domain across all RNA viruses and was there- 
fore used for phylogenetic inference. Phylogenetic analysis revealed that 
the genetic diversity of the newly discovered RNA viruses surpassed 
that described previously and could not always be easily incorporated 
into current virus classifications (Fig. 2; see Supplementary Data 1-21 
for detailed phylogenies). In particular, many of the newly discovered 
viruses occupy topological positions that fall between families or genera 
and thus fill major phylogenetic gaps, so that RNA viruses now occupy 
a more continual spectrum of phylogenetic diversity. 

To describe and accommodate the extraordinary diversity of viruses 
discovered here better, we merged previously defined virus families, 
orders and floating genera, to produce 16 clades of RNA viruses. For 
simplicity, we have abbreviated these clades as ‘Astro, ‘Birna, “‘Hepe- 
Virga, ‘Hypo, ‘Luteo-Sobemo, ‘Narna-—Levi; ‘Bunya—Arena, ‘Mono-Chu, 
‘Orthomyxo, ‘Nido; ‘Partiti-Picobirna, ‘Permutotetra, ‘Picorna—Calici, 
‘Reo, “Tombus-Noda and “Toti-Chryso, reflecting the presence 
of representative viral families or orders within each clade (Fig. 2 
and Supplementary Data 1-20). Notably, these clades resemble, but do 
not necessarily correspond to, the ‘supergroups’ of RNA viruses pro- 
posed previously!!. We also identified at least five clades of RNA viruses 
in which RdRp domains are so divergent that they might be considered 
as new Virus families or orders, although phylogenetic analyses of such 
divergent taxa should be treated with caution (Supplementary Data 21). 
Reflecting the location of their sampling, we provisionally named these 
divergent lineages after ancient Chinese states from the Chungiu period, 
specifically; “Yuevirus, ‘Qinvirus, ‘Zhaovirus, ‘Weivirus and ‘Yanvirus. 

Since our sample processing involves the entire individual invertebrate, 
it is possible that a substantial proportion of the viruses discovered 
here were associated with undigested food, gut microflora or parasites 
that exist within the organisms investigated. We therefore estimated 
the proportion of each viral transcript within the library and assumed 
that the more common the virus, the more likely that it was associ- 
ated with that host (although this may not equate to active infection). 
Generally, those viruses that made up a higher proportion of total RNA 
levels (>0.1% total RNA) were not closely related to those known to 
infect vertebrates, plants or fungi, suggesting that viral RNA quantity 
may be a useful indicator of their true host. To assess the likely host 
species further, we screened for endogenous virus elements (EVEs) 
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related to the exogenous viruses described here'?. Although some 
viruses, such as the Picornavirales, rarely possess EVEs, and there 
is little genome data for species within the Annelida and Mollusca 
phyla, the EVE data helped confirm the host taxon by establishing 
their evolutionary ancestry in that host. In particular, the host taxa 
containing EVEs often closely matched those containing related exog- 
enous viruses (Supplementary Data 3, 7-9, 11, 18, 20, 21). However, 
the EVE data also suggested alternative or additional hosts in a number 
of cases (Supplementary Data 5, 6, 21). For example, the highly diver- 
gent new RNA virus (Weivirus) identified in the mollusc transcrip- 
tome was related to EVEs identified in alveolates (protist) genomes 
(Supplementary Data 21). Finally, we also examined the presence 
of variant genetic codes as a guide to the likely host organisms. The 
most common variant genetic codes observed were the invertebrate 
mitochondrial code (Supplementary Data 6, 11) and the ciliate, dasy- 
cladacean and hexamita nuclear codes. The latter is found in the new 
RNA virus Zhaovirus (Supplementary Data 21), as well as in a cluster 
of viruses from the Tombus—Noda clade (Supplementary Data 19), 
indicating that these viruses are more likely to be associated with 
protists than with invertebrates. 

Overall, the host spectrum for the RNA viruses described here is 
broad, including different phyla and sometimes different kingdoms 
(Supplementary Data 1-21). Much of our sampling was directed 
towards the Arthropoda, meaning that definite statements on host 
range cannot be made. Despite this bias, the diversity of arthropod 
viruses is notable as they appear in multiple lineages within each major 
clade (Extended Data Fig. 1 and Supplementary Data 1-21). Also of 
note were the phyla Mollusca, Annelida, and Sipuncula (collectively 
the superphylum Lophotrochozoa) that diverged early from Nematoda 
and Arthropoda in the metazoan phylogeny’. Notably, the viromes of 
these phyla either contained extremely divergent viruses (such as in the 
Bunya-Arena and Orthomyxo clades; Supplementary Data 7 and 9, 
respectively) or had substantial overlap with the arthropod virome 
(for example, several viruses in the aquatic picorna-like clade 
are commonly present in both Crustacea and Lophotrochozoa; 
Supplementary Data 14). Although only a limited number of 
species were available for the remaining phyla, it is notable that the 
Platyhelminthes and Cnidaria did not contain particularly divergent 
viruses, despite their basal position within Metazoa. Finally, although 
the phylum Echinodermata and the subphylum Tunicata of Chordata 
are more closely related to vertebrates than the other invertebrate taxa 
studied here, we did not identify any viruses that were clearly ancestors 
of vertebrate-specific virus families. 
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Figure 2 | Phylogenetic diversity of RNA viruses. Thirteen phylogenetic 
trees representing the major clades of RNA virus RdRp domains (see main 
text for definitions). Within each tree, the viruses discovered here are 
shaded red, while those described previously are shaded grey. The name of 
each clade is shown to the top left of each phylogeny and the names of the 
families or genera within clade are shown below the tree. Each scale bar 
indicates 0.5 amino acid substitutions per site. More detailed trees for each 
clade are shown in Supplementary Data 1-21, and their genome structures 
are shown in Supplementary Data 22-36. 


Although viruses from divergent host taxa tend to form separate 
phylogenetic groups, suggesting that these virus—host associations have 
been established over long evolutionary timescales, there was generally 
little resemblance between the phylogenetic histories of viruses and 
their hosts, such that strict virus—host co-divergence cannot always be 
assumed. Indeed, there are clear examples of cross-species transmission 
of viruses among divergent host taxa. For example, several viruses that 
infect plants (such as Tenuivirus and Fijivirus; Supplementary Data 7 
and 18, respectively) may be derived from arthropod viruses as they are 
nested within arthropod viruses on the phylogenies and the phyloge- 
netic divergence between the plant and arthropod viruses was shallow. 


Patterns of RNA virus genome evolution 

Despite the presence of conserved RdRp sequences, the evolutionary 
histories of the structural and non-structural parts of the virus 
genomes characterized here often differed substantially (Fig. 3 and 
Supplementary Data 22-36). A single RdRp clade may contain coat 
proteins from diverse clades, and vice versa (Fig. 3 and Extended Data 
Table 1), indicative of widespread recombination among structural and 
non-structural genomic regions over long evolutionary timescales. 
Such incongruence is commonly observed within and between the 
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major groups of positive-sense RNA viruses (that is, the Tombus—Noda, 
Luteo-Sobemo, Hepe-Virga, Permutotetra, Astro and Narna-Levi 
clades). Notably, the occurrence of recombination seems to be unaf- 
fected by genome organization. There were major differences between 
the tree topologies of the RdRp and coat protein domains for unseg- 
mented viruses of the “‘Tombus—Noda’ clade (Extended Data Fig. 2), 
and the correlation coefficient between the two genetic distance 
matrices for these proteins was low (Extended Data Fig. 2b). By 
contrast, the ‘Picorna—Calici’ clade had a more stable genome structure 
(Supplementary Data 33) and lower rate of genetic exchange (Extended 
Data Fig. 2b). Envelope glycoproteins were also involved in inter- 
virus recombination, although such events were rarer. Specifically, 
we documented recombination events involving the glycoproteins of 
highly divergent virus groups, including within negative-sense RNA 
viruses, between negative- and positive-sense RNA viruses, and even 
between negative-sense RNA and DNA viruses (Extended Data Table 1 
and Supplementary Data 27, 28). 

The data also show that the evolution of structural genes involves the 
gain and loss of genes, which can occur in both segmented and unseg- 
mented viruses. We found viruses with multiple copies of structural 
genes, such as coat protein genes in the Hepe-Virga clade (Extended 
Data Fig. 3a) and glycoprotein genes in the Mono—Chu clade (Extended 
Data Fig. 3b). In addition, their diverse positions on the phylogeny 
suggest that these additional gene copies were independently acquired 
through lateral gene transfer rather than being generated de novo by 
gene duplication (Extended Data Fig. 3a). In other taxa, a reduction in 
the number of genes encoding structural proteins has been observed. 
Structural genes are, for example, relatively more frequently lost in 
negative-sense RNA viruses (Extended Data Fig. 3c), and viruses with 
such ‘reduced’ genomes are found in Nematoda, Arthropoda, and 
Platyhelminthes (Supplementary Data 27-28). Across the dataset as a 
whole, gene loss most often involved the glycoprotein, although several 
viruses within the Bunya—Arena clade may lack both glycoprotein and 
nucleoprotein genes (Supplementary Data 27; see Methods). Viruses 
with no structural proteins are also present in some positive-sense and 
double-stranded RNA (dsRNA) viruses, such as the Endornaviridae, 
Hypoviridae, Narnaviridae and Umbravirus in the Hepe-Virga, Hypo, 
Narna-—Levi, and Tombus-—Noda clades, respectively'4, and we can 
tentatively identify clusters of viruses whose genome may only contain 
a replicase (Supplementary Data 23, 26, 27, 31). 

We identified a number of protein domains in the non-structural part 
of the genome that are shared among divergent viruses, and even with 
cellular organisms (Extended Data Table 1). These include the RNA 
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Figure 3 | Genetic exchange among RNA viruses. Comparison of the 
phylogenetic trees of 76 representative viral genomes with different types 
of structural protein (that is, four major types of capsid protein) and the 
equivalent phylogenies obtained for their RdRp amino acid sequences 
(eight clades as defined in the text, shown in different colours). Line 
colours correspond to those of the RdRp clade as shown to the left of the 
figure. Widespread recombination can be inferred when RdRp clades are 
associated with different types of structural protein, and vice versa. 
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helicase, methyltransferase, exonuclease, protease, ADP-ribose binding 
protein (the macro domain), dsRNA binding protein, and even the 
Escherichia coli swarming motility protein (the NADAR domain). For 
example, we identified the parallel acquisition of eukaryotic-origin exo- 
nuclease domains (Extended Data Fig. 4a). Notably, the virus genomes 
that contain these exonuclease domains were highly divergent, although 
all were present in the same host (Ligia exotica, Extended Data Fig. 4a). 
In the case of the serine protease, virus diversity appeared both within 
and outside of the phylogenetic diversity of cellular proteins (Extended 
Data Fig. 4b), indicative of independent gene acquisitions. Also of note 
was the wide, but highly sporadic, phylogenetic distributions of some 
of these domains (such as Macro and NADAR; Extended Data Table 1) 
and the variable insertion locations in the genome (Extended Data 
Fig. 4c), again pointing to multiple, independent gene acquisitions. 
There have been major reconfigurations of viral genome organiza- 
tion through evolutionary history, including the number and arrange- 
ment of open reading frames, the order of structural and non-structural 
genes, and the occurrence and extent of segmentation (Fig. 4 and 
Supplementary Data 22-36). These features, which are often regarded 
as conservative traits, in reality show great flexibility at the deep 
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evolutionary scale studied here. Examination of the newly discovered 
viruses reveals that the evolution of segmented genomes, or the loss of 
segmentation, has occurred frequently. Several viruses experience high 
frequencies of segmentation, including members of the Tombus—Noda 
and Mono-Chu clades, in which the break-up and reunification of struc- 
tural and non-structural genes has occurred relatively frequently (Fig. 4 
and Supplementary Data 28, 34). The dynamics of segmentation were also 
reflected by changes in the number of segments. For instance, the number 
of segments in both the Partiti-Picorbina and Bunya—Arena clades vary 
from 1 to 6 (Fig. 4). Notably, the change in segment numbers is not only 
associated with gene break-up and unification, but also with the gain and 
loss of genes. We also found a tri-segmented counterpart of the normally 
bi-segmented arenaviruses that exhibited no structural gene homology 
with other members of this group (Supplementary Data 7, 27). Despite 
this flexibility, it is also the case that a genomic plan comprising multiple 
segments can be conserved for extended time periods, as is seen in the 
reoviruses and orthomyxoviruses. In one instance in the latter, we identi- 
fied a highly divergent virus from the earthworm that possessed a similar 
genomic plan to other orthomyxoviruses (with 6 segments), including 
those from vertebrates (Fig. 4 and Supplementary Data 29). 
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Discussion 

We have used a simple, yet powerful, metagenomic approach to 
characterize the viromes of diverse invertebrates. This approach is 
relatively unbiased, as no attempt is made to enrich viral particles 
through filtering, centrifugation and nuclease treatment. Although 
some invertebrates seemingly harboured a high proportion of viral 
RNA, which evidently depends on the infection status in the host in 
question, we were unable to determine whether the viruses identified 
here have any impact on host biology, including as agents of disease. 
Despite this, it is clear that for many invertebrates infection by multiple 
RNA viruses is likely to be the norm rather than the exception'*"!’. 

The whole-transcriptome approach employed here allowed us to cha- 
racterize the virome of a diverse array of invertebrates, providing a new 
perspective on viral biodiversity. A number of virus families that were 
previously only known to infect plants, fungi, and protists are now visible 
in invertebrates. Viruses infecting divergent phyla were dispersed 
throughout the phylogenetic trees and exhibited diverse patterns of 
clustering, reflecting a complex interplay between long-term virus-host 
associations, including co-divergence, as well as frequent host jumping"®. 

Despite the relatively high frequency of potential cross-species virus 
transmission documented here, there were also probable cases of long- 
term virus—host co-divergence. The viruses found in several species of 
parasitic nematodes tend to form monophyletic clusters, within which 
the phylogeny of viruses mirrors that of their hosts. In addition, the 
genetic diversity of RNA viruses within the Narna—Levi clade can be 
placed into three groups: those that infect bacteria (leviviruses), those 
that infect mitochondria (that is, mitoviruses that utilize the mito- 
chondrial genetic code), and those that infect other organisms. This 
separation could in theory have occurred when the «-proteobacteria 
became intracellular symbionts'””°. Finally, despite the massive expan- 
sion of virus diversity documented here, some vertebrate viruses (such 
as those from the Picornaviridae, Paramyxoviridae and Hepeviridae 
families) remain monophyletic, with viruses from mammals, birds, 
reptiles, and fish occupying similar phylogenetic relationships to those 
of their hosts groups, probably indicative of long-term co-divergence. 

We have necessarily inferred phylogenetic trees using the relatively 
conserved gene encoding RdRp. However, the evolutionary history 
of the entire genome is evidently more complex and not necessarily 
consistent with that of the RdRp domain. Indeed, at deep evolutionary 
timescales it is easier to trace the evolutionary history of individual 
functional units, such as the RdRp, helicase, and capsid, rather than that 
of intact viral genomes. Such modular genome evolution is reflected 
in three aspects of genetic diversity. First, there is great flexibility in 
the organization of functional units, including changes in genome 
segmentation and gene order. Second, different functional ‘units’ within 
genomes can be acquired or removed independently, although such 
processes occur relatively infrequently, which is likely to reflect strong 
restrictions on virus genome size. Thus, the simplest virus genome 
may contain only the replication module, whereas the most complex 
can contain multiple structural units or accessory units. Third, there 
is clear evidence for the exchange of functional units among viruses, 
particularly for structural proteins that can seemingly move large phy- 
logenetic distances. Hence, the macroevolution of RNA viruses parallels 
the modular evolution previously proposed for bacteriophages”', albeit 
with differences in timescale and mechanism. However, in the face 
of such abundant diversity, it is notable that none of the RNA viruses 
described here has a genome that exceeds the previously defined 
maximum of approximately 32 kb, probably reflecting intrinsic size 
constraints owing to error-prone replication!°. 

By sampling a diverse range of invertebrate taxa, we have revealed 
unprecedented levels of RNA virus genetic diversity that both re-shapes 
our understanding of the patterns and processes of their evolution and 
highlights the limitations of our knowledge on what are likely to be the 
most abundant organisms on earth”. A full understanding of virus 
evolution and ecology will require an extensive survey of diverse host 
organisms using the types of metagenomic approach outlined here. 
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METHODS 


No statistical methods were used to predetermine sample size. These experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Sample collection and processing. This study was based on the analysis of 
87 libraries of invertebrate samples obtained from various locations in China 
(Supplementary Table 1). The sampling included land and freshwater organisms 
from Anhui, Beijing, Hubei, Xinjiang and Zhejiang provinces. Marine and coastal 
samples were obtained from Zhejiang (East China Sea) and Guangxi (South China 
Sea) provinces. 

A total of 54 of the libraries were from arthropods (phylum Arthropoda). 
Of these arthropod libraries, 19 have been described previously”, with 35 
additional libraries newly obtained here. We sampled across all four subphyla from 
the phylum Arthropoda: (i) for the subphylum Chelicerata we sampled from the 
class Merostomata (horseshoe crabs) and Arachnida (spiders and ticks); (ii) for 
the subphylum Crustacea we sampled from the classes Branchiopoda (water fleas), 
Maxillopoda (barnacles), and Malacostraca (crabs, shrimps, crayfish, woodlice, 
wharf roaches, and so on); (iii) for the subphylum Hexapoda we sampled from nine 
orders (Coleoptera, Dermaptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, 
Odonata, Orthoptera, and Siphonaptera), all within the class Insecta; and (iv) for 
the subphylum Myriapoda we sampled from the classes Chilopoda (centipedes) 
and Diplopoda (millipedes). 

All of the taxa from the phyla Nematoda and Platyhelminthes sampled here 
were parasites. The majority of the nematodes were from the family Ascaridida 
(class Secernentea) and found in the stomachs of pigs (libraries HC and WHZHC) 
and birds (libraries HC, JMYJCC, SYJFC, and WHJHC). We also collected three 
divergent nematode species within the class Secernentea from the stomachs of 
mice (library WHSWHC), the thoracic cavity of birds (library XSNXC), and 
the stomachs of snakes (library XZSJSC). In addition, we included a nematode 
species that infects mosquitoes (library WCLSXC) tentatively classified in the 
genus Romanomermis (class Adenophorea). For the phylum Platyhelminthes, 
our sampling comprised one species of tapeworm (Taenia sp.) discovered in the 
liver of rodents, as well as two species of blood flukes (Schistosoma mansoni and 
S. japonicum) initially obtained from an intermediate host (Oncomelania) and 
then matured in rabbits. 

We also sampled three phyla within the superphylum Lophotrochozoa, namely 
Mollusca, Sipuncula, and Annelida. Our samples from the phylum Mollusca com- 
prised a number of marine and freshwater species, including the classes Bivalvia 
(such as clams, mussels, oysters), Gastropoda (Chinese land snails, various sea 
snails, etc.), and Cephalopoda (octopus). For the phylum Sipuncula (peanut 
worms), we sampled the most common species in the shallow waters of the 
South China Sea, namely Phascolosoma esculenta and Sipunculus nudus. For the 
phylum Annelida, we sampled three representative classes: Polychaeta 
(sandworms), Oligochaeta (earthworms), and Hirudinea (leeches). 

All the samples described above were protostomes. For the deuterostomes our 
samples comprised the phyla Chordata and Echinodermata. Within the Chordata, 
we sampled two species from the subphylum Tunicata, the invertebrate group 
most closely related to the vertebrates. Within the Echinodermata, we sampled 
representatives from the classes Echinoidea (sea urchins) and Holothuroidea 
(sea cucumbers). Finally, we examined one species from the radially symmetric 
phylum Cnidaria (sea anemones) as a representative of basal lineages within the 
Metazoa. 

All samples were captured alive and stored at —80°C. A proportion of the 
animals were left in Petri dishes (including pillworms, earthworms, centipedes, 
fiddler crabs, hermit crabs, woodlice, sandworms, oysters, Paphia shells, razor 
shells, Murex snails, Turritella sea snails, and Chinese land snails) or in purified 
sea water (horseshoe crabs, penaeid shrimps and mantis shrimp) for up to 24h 
to clean stomach contents before transferring to —80°C. Sample processing often 
involved the entire animal. However, for those animals with large body sizes, hard 
shells, or tissues that were difficult to homogenize, dissection was performed to 
obtain the entire inner organs (visceral mass for molluscs) or parts of different 
inner organs (Supplementary Table 1). During the dissection, the content of gut 
was intentionally excluded to reduce contamination. 

Host species identification was initially carried out by experienced field 
biologists. Further confirmation was based on analysing the cytochrome c oxidase 
subunit I (COI) gene. COI sequences were first obtained from the assembled 
contigs and then by Sanger sequencing. They were subsequently compared 
against the NCBI non-redundant nucleotide database and the BOLD database 
(http://www.boldsystems.org/) for host species identification and confirmation. 
RNA library construction and sequencing. On the basis of the complexity of 
the component samples, our libraries were divided into two categories: (i) simple 
libraries that contained single or multiple individuals from one or two closely 


related species; (ii) mixed libraries that contained multiple species from a particular 
taxonomic group. For example, the library WLJQ is a mix of individuals from the 
order Decapoda sampled in the East China Sea. For some of the mixed libraries, 
we later sequenced individual species (that is, before pooling) to assist genome 
characterization. 

To construct each library, the processed samples were first washed with a 
standard, sterile, RNA and DNA-free PBS solution (GIBCO). This washing was 
performed three times, and each time the solution was pipetted to agitate the 
solution and remove the surface organisms/material while keeping the organisms/ 
tissue intact. The samples were then homogenized in 500-700 1l PBS solution 
using the Mixer mill MM400 (Restsch). Total RNA was extracted using TRIzol 
LS reagent (Invitrogen) and subsequently purified using EZNA Total RNA 
Kit (OMEGA). Aliquots of the resultant RNA solutions were then pooled in 
equal quantity and quality checked using an Agilent 2100 Bioanalyzer (Agilent 
Technologies) before library construction and sequencing. For most libraries we 
used the TruSeq total RNA Library Preparation protocol (Illumina). rRNA was 
removed using either the Ribo-Zero-Gold (Human-Mouse-Rat) Kit (Illumina) 
or the Ribo-Zero-Gold (Epidemiology) Kit (Illumina). For five libraries we the 
used TruSeq mRNA Library Preparation protocol (Illumina) that only targeted 
RNA with poly(A) tails, although these sequencing results were not used in the 
quantification. The information on library construction methods for each pool can 
be found in Supplementary Table 1. Paired-end (90 or 100 bp) sequencing of each 
RNA library was performed on the HiSeq 2000 platform (I]umina). All library 
preparation and sequencing was carried out by BGI Tech. 

Sequence assembly and RNA virus discovery. For each library, sequencing reads 
were quality trimmed and assembled de novo using the Trinity program”? with 
default parameter settings. No filtering of host/bacterial reads was performed 
before the assembly. The assembled contigs were first compared (using blastx) 
against the database of all reference RNA virus proteins downloaded from 
GenBank, which include those within the taxonomic classes ssRNA viruses (txid 
439488), dsRNA viruses (txid 35325), and Deltavirus (txid 39759). We set the 
e-value to 1 x 10-5 to maintain high sensitivity and a low false-positive rate. To 
detect highly divergent viruses, we performed domain-based blast by comparing 
the assembled contigs against the Conserved Domain Database (CDD) version 
3.14 with an expected value threshold of 1 x 10~?. Sequences with positive hits to 
the domain RNA_dep_RNAP (cd01699) were retained. After the initial screening, 
potential false-positives were discovered by (i) comparing (blastx) putative viral 
contigs against the entire non-redundant protein database, and (ii) inspecting the 
sequence alignment for conserved domains. The quality-filtered viral sequences 
were incorporated into the reference protein database for a second round of blastx. 

To identify potential retroviruses and retrotransposons, we examined the 
domain blast results for any hit to the superfamily reverse-transcriptase-like 
domain (RT_like, cl02808), excluding those related to RNA_dep_RNAP (cd01699). 
To avoid false positives, we used a higher e-value threshold (1 x 10°). All putative 
reverse transcriptases recovered were aligned to related proteins to determine the 
presence of key motifs. 

Confirmation and extension of virus genomes. Viral contigs with unassem- 
bled overlaps or from the same scaffold were merged using the SeqMan program 
implemented in the Lasergene software package v7.1 (DNAstar). Gaps were filled 
by RT-PCR and Sanger sequencing. To confirm the assembly results, reads were 
mapped back to the full length genome with Bowtie2 (ref. 24) and inspected using 
the Integrated Genomics Viewer’. For genomes with novel structures or that 
contained sequences originating from lateral gene transfer events, we verified the 
complete or near complete viral genome by designing overlapping primers based 
on the assembled sequences (Supplementary Table 2). To check these viruses have 
no DNA stage, we used PCR and Sanger sequencing to examine the DNA extracted 
from the same set of viruses on which we performed RNA genome confirmation 
(Supplementary Table 2). Finally, genome termini were determined by RNA 
circularization or 5/3’ RACE kits (TaKaRa) as described previously’. 

Transcriptome annotation. For each library, we annotated the top 1,000 most 
common transcripts. The quantity of the transcripts were determined using the 
RSEM program”* implemented in Trinity. The top 1000 common transcripts were 
then compared against four databases: (i) the non-redundant protein database (nr), 
(ii) the non-redundant nucleotide database (nt), (iii) the whole-genome shotgun 
database (wgs), and (iv) the Conserved Domain Database (CDD). The resultant 
information was used to identify the origin of sequences. Host and mitochondrial 
genes were identified using two criteria: (a) well-characterized domain/functional 
information from the domain-based blast results or, in the case of non-coding 
sequences, from the results of the blastn search against the nt database, and 
(b) the presence of identical sequences in the host genome or homologous genes 
in the genome of related host taxa. RNA virus genomes and retrotransposons were 
identified as described under the ‘Sequence assembly and RNA virus discovery’ 
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section. Bacterial contigs were identified if they exhibited high nucleotide similarity 
(>80%) to a particular bacterial genome. Finally, the transcripts of DNA viruses 
were identified if they shared protein sequence similarity with either viral poly- 
merases or virus-specific genes (for example, capsid proteins) described previously. 
The remaining contigs were tentatively annotated as ‘undetermined. 
Determination of additional virus genome segments. In those viruses with 
multiple segments we used various strategies to search for genome segments 
other than the RdRp. The majority of such segments were found based on their 
homology to the proteins of related reference viruses. For segments that encode 
proteins with no known homologues, we used in silico approaches that collec- 
tively utilize information on RNA quantity, protein structure, and/or conserved 
genome termini. To determine that these segments belonged to the same virus, 
we checked: (i) the sequencing depth of the segments; (ii) the presence of inverted 
complementary genome termini or conserved regulatory sequences in non-coding 
regions of the genome; (iii) whether the segments were found in the same samples; 
and (iv) the phylogenetic positions of related viral proteins. For example, Wuhan 
cricket virus 2 had six potential segments, of which only two exhibited homology to 
known viruses. The remaining segments were initially identified as ‘undetermined’ 
protein-coding contigs which, like the segments encoding the RdRp and capsid, 
were the most frequent contigs in the transcriptome. Further alignment of the six 
segments revealed conserved stretches at both the 5’ and 3’ ends (confirmed by 
RACE), implying they are derived from the same virus. Similarly, the five unknown 
segments of Changping earthworm virus 2, a divergent member of the ‘Orthomyxo’ 
clade, were identified by RNA quantity and the presence of the same inverted 
complementary genome termini. In addition, two of the segments were identified 
as potential glycoprotein genes because both encoded proteins had an N-terminal 
signal domain, a C-terminal or mid-point transmembrane domain and putative 
glycosylation sites. 

Despite our best efforts to address the lack of sequence similarity, it remained 
difficult to fully characterize the genome segments in a number of divergent 
viruses. In some of these cases the viruses were likely to be unsegmented even 
though related viruses appeared to harbour multiple segments. For example, in 
the case of the nematode-associated lineage within the ‘Bunya—Arena clade (that 
is, that including Shayang Ascaridia galli virus 1), our annotation of contigs with 
similar abundance levels suggested they were either of host origin or derived from 
other viruses, rather than representing another protein-coding segment. 

It is, however, critical to acknowledge that all segment identification is tentative 
at this stage, and requires confirmation from virus isolation. 

Estimation of viral transcript frequency. To help determine the frequency of viral 
RNAs, we estimated the percentage of reads that mapped to viral RNA within the 
transcriptome of each host. To reduce any bias caused by the unequal efficiency 
of rRNA removal during library preparation, we first removed reads that mapped 
to rRNA contigs from each library. The remaining reads were then mapped to the 
entire collection of virus sequences within the library, from which we calculated the 
overall percentage of viral reads. The proportion of individual viral RNAs was then 
estimated based on the mapping results. To confirm those results in which viral 
RNA comprised a large percentage of transcripts within the host transcriptome, 
we re-extracted the total RNA from aliquots of the original homogenates and 
performed RNA-seq library preparation and sequencing as described above. 
Inference of virus evolutionary history and virus nomenclature. To infer the 
phylogenetic relationships among RNA viruses we collected all replicase proteins 
translated from the virus sequence collections described above. For comparison, we 
downloaded from GenBank reference virus genomes from all established families 
and floating genera of RNA viruses (excluding retro-transcribing viruses) and 
non-reference virus genomes not included in the current classification scheme but 
that are relatively closely related to the viruses discovered here. The viral replicase 
sequences were then aligned using MAFFT version 7 employing the E-INS-i 
algorithm’, All alignments were trimmed so that they only contained the RdRp 
and its neighbouring conserved domains. All ambiguously aligned regions were 
then removed using the TrimAl program”*. For each sequence alignment, the best- 
fit model of amino acid substitution was determined using ProtTest 3.4 (ref. 29). 
Phylogenetic trees were then inferred using the maximum likelihood approach 
(ML) implemented in PhyML version 3.0 (ref. 30), employing Subtree Pruning 
and Regrafting (SPR) branch-swapping. Branch support was accessed using an 
approximate likelihood ratio test (aLRT) with the Shimodaira—Hasegawa-like 
procedure as implemented in PhyML. 

The (provisional) naming of viruses was based on the following approach: the 
name of a virus characterized by a high proportion of RNA transcripts (>0.1% of 
non-rRNA reads) contains information on the geographic location of sampling, 
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the host common name (such as ‘tick’), and a virus number; whereas the name 
of a virus characterized by a low proportion of RNA transcripts (<0.1% non- 
rRNA reads), for which host assignments are less certain, contains information on 
geographic region of origin, closest family/genus (for example, ‘astro-like’), 
and virus number. The strain name of each virus (shown in each detailed tree, 
Supplementary Data 1-21) comprises the library abbreviation followed by its 
contig number. 

Virus genome annotation. Potential viral open reading frames (ORFs) were 
predicted based on two criteria: (i) the predicted amino acid sequences were longer 
than 100 amino acids in length, and (ii) if a short ORF (<200 amino acids) was 
completely nested within a larger one it was not regarded as a potential ORF unless 
it had a homologue in a closely related virus. The annotations of these ORFs were 
mainly based on comparisons to the Conserved Domain Database. Domains that 
were potentially subject to lateral gene transfer were further examined by sequence 
alignment and phylogenetic analyses. For the remaining ORFs, we predicated their 
potential functions by blast searches against the nr protein database with an e-value 
threshold of 1 x 10-5 and by primary protein structure predication using the 
programs SignalP, TMHMM, and NetNGlyc available through the website (http:// 
www.cbs.dtu.dk/services/). 

Analysis of recombination and lateral gene transfer. For each newly identified 
protein, we searched for any potential homologues against all RNA virus 
proteins (including those newly identified here), all DNA virus proteins, and those 
from the cellular organisms (using a subset of the nr database). We also identified 
homology if the proteins matched the same domain in the structure-based blast. 
On the basis of these results, we identified several well-established homologous 
protein clusters. We then mapped these protein clusters onto the RdRp phylogenies, 
which enabled us to identify topological inconsistencies that were likely to be the 
result of lateral gene transfer. To identify homologous recombination events, we 
compared the phylogenies of each homologous protein cluster to that of the RdRp. 
To measure the degree of phylogenetic incongruence, we transformed the two 
phylogenies into patristic genetic distance matrices and calculated the Pearson 
correlation coefficient. 

Analysis of endogenous virus elements. All genomes from cellular organisms 
available in GenBank were downloaded and incorporated into our local genome 
nucleotide database. Endogenous copies of the exogenous viruses (that is, EVEs) 
were detected using the tblastn algorithm against this database. The query involved 
amino acid sequences translated from both the virus genomes newly identified 
here as well as the reference virus genomes used in this study. The threshold for 
the search was set to 100 amino acids for length and 1 x 10-*° for e-value. For 
each potential endogenous virus, the query process was reversed to determine 
their corresponding phylogenetic group. The results were also checked manually 
to exclude those sequences involved in lateral gene transfer. 

Data availability. All new sequence reads generated here are available at the 
NCBI Sequence Read Archive (SRA) database under the BioProject accession 
PRJNA318834 (Supplementary Table 1). All virus genome sequences generated 
in this study have been deposited in GenBank under the accession numbers 
KX882764—-KX884872 (Supplementary Table 2). All viruses discovered in this 
study (fasta format), sequence alignments (fasta format), and phylogenetic trees 
(newick format) are available at https://figshare.com/articles/Redefining_the_ 
invertebrate_RNA_virosphere/3792972. 
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Extended Data Figure 1 | The contribution of major viral clades to the total virome of each host phylum/order. a, b, These analyses are based on 
viruses at all frequency levels (a), and viruses in which the frequency exceeds 0.1% of the total number of non-rRNA reads (b). 
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Extended Data Figure 2 | Phylogenetic incongruence between the is displayed to maximize topological congruence. b, The degree of 

RdRp and structural proteins. a, Match between the phylogenies of the phylogenetic incongruence for different pairs of structural and non- 
RdRp and coat proteins (S-domain like) for non-segmented members of structural phylogenies. The comparisons were based on patristic distances 
the Tombus—Noda clade. The relationship between the two phylogenies matrices derived from the phylogenies. 
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Extended Data Figure 3 | The gain and loss of RNA virus structural of a glycoprotein in the genome of Hubei Lepidoptera virus 2 from the 
proteins. a, The parallel acquisition of multiple copies of structural Mono-Chu Clade. Its genome is compared against that of a closely related 
proteins by viruses within the Hepe-Virga clade. Left panel shows virus (Hubei dimarhabdovirus-like virus 2). Homologous proteins are 
an outline of the structural part of their genomes, with homologous connected with dotted lines, and the target glycoprotein is shown in red. 
structural genes marked in yellow and multiple copies of these proteins c, Three examples of glycoprotein loss in the Mono-Chu Clade. 
within the same genome labelled as TT; ‘II’ and ‘IIT. Right panel shows a Homologous proteins are connected with dotted lines, and the target 
maximum-likelihood phylogeny depicting the evolutionary history of glycoproteins are shown in blue. 
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Extended Data Figure 4 | Lateral gene transfer between RNA viruses 
and cellular organisms. a, Evolutionary origin of two exoribonucleases 
(cd06133) in two sea-slater-associated viruses (Beihai hepe-like virus 2 
and Beihai sea slater virus 4). Top, alignment of viral and (human) cellular 
exoribonucleases. The solid triangles indicate the key catalytic sites. Lower 
left panel shows the phylogenetic positions of the two viruses (marked 
with solid red circles) whose genomes contain these exoribonucleases. 
The host information for each virus is shown in parentheses. Lower right 
panel shows the phylogenetic position of the virus exoribonucleases (solid 
red circle) in the context of cellular exoribonucleases. b, Evolutionary 
origin of viral serine proteases (cd00190). The phylogeny contains serine 
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proteases from RNA viruses (solid red circles), DNA viruses (solid blue 
circles) and cellular organisms. Serine proteases from RNA viruses are 
either highly divergent or group within the diversity of cellular proteins. 
c, Relative positions of different protein domains in the replicase 

of selected Hepe-Virga viruses. The domains are shown as ovals 

and marked with different colours, and comprise: RdRp (cd01699), 
Helicase (pfam01443), FstJ (pfam01728), OTU (OTU-like cysteine 
protease, pfam02338), Macro (cl00019), NADAR (cd15457), and viral 
methyltransferase (pfam01660). More detailed depictions of lateral gene 
transfer can be found in Supplementary Data 22-36. 
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Extended Data Table 1 | Distribution of homologous protein clusters across divergent taxonomic groups (RNA viruses, DNA viruses and 


cellular organisms) 
Category —_ Protein (Domain) eee Astro Flavi ae eae Boned Nido Permutotetra oe Reo ae eae “ENA ae ae ae 
Capsid (S domain) pfam00729 x x x x x x x x 
Capsid (peptidase A21) pfam03566 x xX x x 
Capsid (Alvernavirus core like) N/A x x x xX 
Glycoprotein (Okavirus like) N/A x x 
Structural | Glycoprotein (Ferak virus like) N/A x xX 
aes Temagglutinin- pfam00423 x x 
Glycoprotein (Choristoneura 
rosaceana alphabaculovirus N/A x x 
GP64) 
ico (Ostreid herpesvirus N/A x x 
RNA Helicase (picorna-like) pfam00910 x xX 
Viral methyltransferase pfam01660 x x 
FtsJ-like methyltransferase pfam01728 x x x x 
Macro (ADP-ribose binding) cl00019 xX x x xX x x 
Ribonuclease III (dsRNA binding) cl00054 x x x x x x 
Non- 3'-5' exonucleases cd06133 x x 
structural 
oon motility protein d15457 x x x x 
OTU-like cysteine protease pfam02338 x x x x 
20G-Fe(II) oxygenase pfam13532 x x x x x 
Trypsin-like serine protease 121584 x x x x x x x x 
RNA 2'-phosphotransferase pfam01885 x x 
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Persistent microbiome alterations modulate 
the rate of post-dieting weight regain 


Christoph A. Thaiss'*, Shlomik Itav'*, Daphna Rothschild?*, Mariska T. Meijer!, Maayan Levy!, Claudia Moresi', 
Lenka Dohnalova!, Sofia Braverman!, Shachar Rozin!, Sergey Malitsky*, Mally Dori-Bachash!, Yael Kuperman’, Inbal Biton’, 


Arieh Gertler®, Alon Harmelin®, Hagit Shapiro!, Zamir Halpern’® 


, Asaph Aharoni*, Eran Segal?s & Eran Elinav's 


In tackling the obesity pandemic, considerable efforts are devoted to the development of effective weight reduction 
strategies, yet many dieting individuals fail to maintain a long-term weight reduction, and instead undergo excessive 
weight regain cycles. The mechanisms driving recurrent post-dieting obesity remain largely elusive. Here we identify an 
intestinal microbiome signature that persists after successful dieting of obese mice and contributes to faster weight regain 
and metabolic aberrations upon re- exposure to obesity- promoting conditions. Faecal transfer experiments show that the 
accelerated weight regain phenotype can be transmitted to germ-free mice. We develop a machine-learning algorithm 
that enables personalized microbiome-based prediction of the extent of post-dieting weight regain. Additionally, we 
find that the microbiome contributes to diminished post-dieting flavonoid levels and reduced energy expenditure, and 
demonstrate that flavonoid-based ‘post-biotic’ intervention ameliorates excessive secondary weight gain. Together, 
our data highlight a possible microbiome contribution to accelerated post-dieting weight regain, and suggest that 
microbiome-targeting approaches may help to diagnose and treat this common disorder. 


The past century has witnessed an alarming increase in the prevalence 
of obesity, with over 44% of the adult world population estimated to be 
overweight, and over 300 million adults suffering from morbid obesity. 
Obesity is considered a major risk factor for ‘metabolic syndrome’ and 
its complications, with consequences for life expectancy, quality of life 
and healthcare costs’. 

Despite continuous medical and scientific effort, long-term strategies 
aimed at attenuating or reversing the obesity epidemic have yielded 
disappointing results. While a plethora of dietary approaches efficiently 
induce weight reduction, in up to 80% of cases in which weight loss was 
initially successful, reduced weight is not maintained, and instead is 
followed by recurrent weight gain and relapsing metabolic complica- 
tions within 12 months of initial weight reduction that may even exceed 
the pre-dieting metabolic derangements”. Post-dieting weight regain 
is substantially influenced by non-genetic factors, as exemplified by 
progressively worsening weight regain in weight-cycling twins as com- 
pared to their non-dieting siblings’, and is suggested to be independent 
of starting weight’ and level of exercise’. Thus, the mechanisms under- 
lying the weight-cycling-induced obesity phenomenon, commonly 
referred to as the ‘yo-yo effect; remain to be determined®”. 

An emerging factor affecting human metabolic homeostasis and 
the risk for obesity and its metabolic complications is the intestinal 
microbiome. Compositional and functional microbiome alterations, 
termed dysbiosis, have been suggested to contribute to the pathogenesis 
of obesity in both animal models and humans*”. Moreover, dietary 
changes have been demonstrated to be central drivers of microbiome 
composition and function, and markedly affect the microbiome within 
days of initiation'®". 

In this study, we used mouse models of weight loss and recur- 
rent obesity to investigate the mechanisms underlying exacerbated 


metabolic complications following weight cycling. We find that, in 
contrast to obesity-associated metabolic derangements that can be 
efficiently reverted upon dieting, obesity-induced alterations to the 
microbiome persist over long periods of time and enhance the rate 
of weight regain upon encounter of a second metabolic challenge. 
This post-dieting dysbiosis affects intestinal flavonoid levels, which, 
in turn, may impact host energy expenditure. We further devise a 
machine-learning algorithm that successfully predicts the personal- 
ized propensity for recurrent diet-induced obesity solely on the basis 
of microbiome composition and demonstrate that faecal microbiome 
transplantation (FMT) or metabolite-based treatment may ameliorate 
exacerbated post-dieting weight regain. 


Enhanced weight regain after dieting 

To study the mechanisms modulating post-dieting weight regain, we 
used a mouse model of recurrent obesity, in which mice were exposed 
to cycles of a high-fat diet (HED), interleaved by normal chow (NC) 
consumption (cycHFD, Fig. la). Consequently, in these mice, weight 
gain and metabolic syndrome developed during primary exposure to 
HED, followed by recuperation and weight reduction during exposure 
to NC, and then by re-emergence of weight gain and associated met- 
abolic disturbances in subsequent HFD-mediated obesity cycles. As 
controls, we used mice continuously fed a HED, mice continuously fed 
a NC diet, and mice that were exposed to only a single cycle of HFD 
(primHED, Fig. 1a). As observed in recurrently dieting humans’, a 
preceding obesity-weight-loss cycle rendered mice susceptible to accel- 
erated secondary weight gain, even after fully returning to baseline 
weight (Fig. 1b, c and Extended Data Fig. 1a, b). As a result, the net 
weight gain, that is, the weight induced during identical durations of 
high-fat feeding, was higher in the weight-cycling group compared to 
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Figure 1 | Enhanced recurrent weight gain after treatment of obesity. 
a-c, Schematic (a), weight curves (b), and weight regain quantification (c) 
of mice undergoing weight cycling and controls. a.u., arbitrary units. 

d-f, Glucose levels after glucose tolerance test (d), glucose quantification 
(e) and energy expenditure over 48 h (f) during weight regain (week 10). 


mice continuously fed a HFD (Extended Data Fig. 1c). The maximal 
weight reached by both weight-cycling and continuous HFD groups 
was comparable, and higher than in mice exposed to a first cycle of 
HED (Fig. 1b). Moreover, as compared to mice exposed to a single 
cycle of HFD, recurrent obesity was characterized by a significant 
increase in total body fat as determined by MRI (Extended Data 
Fig. 1d, e), enhanced glucose intolerance (Fig. 1d, e), and elevated serum 
levels of leptin and low-density lipoprotein (LDL), but not high-density 
lipoprotein (HDL) (Extended Data Fig. 1f-h). Accelerated weight 
regain in post-dieting mice was associated with decreased energy 
expenditure (Fig. 1f and Extended Data Fig. lip), while food intake 
remained unaffected (Extended Data Fig. 1q). 

Similarly, exacerbated metabolic derangements following a weight 
gain/weight reduction cycle were observed when weight loss was phar- 
macologically accelerated by celastrol, a quinone methide recently 
found to induce weight loss!* (Extended Data Fig. 2a, b). Upon rein- 
statement of HED, mice developed significantly exacerbated second- 
ary weight gain (Fig. 1g, Extended Data Fig. 2c—e), as compared to 
celastrol-treated controls without preceding obesity. Furthermore, to 
model recurrent obesity induced by hyperphagia rather than dietary 
composition, we pharmacologically inhibited leptin signalling!**. 
Mice administered a leptin antagonist for one week while consuming 
NC significantly gained weight, and fully returned to normal weight 
upon cessation of leptin antagonist treatment (Fig. 1h). Upon a 
second challenge with the leptin antagonist, these mice featured a more 
pronounced weight regain compared to mice administered the leptin 
antagonist for the first time (Fig. 1h and Extended Data Fig. 2f, g). 


g-i, Recurrent weight gain in mice treated with celastrol (g), leptin 
antagonist (LeptAnt; h), or up to three HFD cycles (i). Coloured bars 
below weight curves depict durations of the indicated treatments. 
Experiments were repeated at least twice. Data are mean + s.e.m. *P < 0.05 
by ANOVA. See Supplementary Tables 5 and 6 for exact n and P values. 


When exposed to a third cycle of HFD-induced obesity, weight- 
cycling mice exhibited a further exacerbation in weight gain (Fig. li), 
obesity (Extended Data Fig. 2h), and dyslipidemia (Extended Data 
Fig. 2i) as compared to control animals experiencing secondary or 
primary weight-gain cycles. Together, these experiments suggest that 
previous obesity-dieting cycles progressively enhance the susceptibility 
to accelerated weight regain and associated metabolic complications. 


Persistence of post- dieting microbiome alterations 
Given the above results, we proposed that initial obesity had caused 
persistent abnormalities that predisposed mice to relapsing metabolic 
disease upon re-feeding with HFD. We therefore performed metabolic 
profiling during the first episode of obesity (Extended Data Fig. 2)) 
and at the ‘nadir’ phase (Extended Data Fig. 3a), that is, when pre- 
viously obese mice had returned to normal weight that was indistin- 
guishable from that of NC-fed controls. Despite marked metabolic 
derangements during the primary obesity phase (Extended Data 
Fig. 2k-o), neither body fat content, serum cholesterol, glucose 
tolerance, nor serum insulin levels were significantly different between 
post-dieting mice and their non-cycling controls during the nadir 
post-obesity phase (Extended Data Fig. 3b-f). Similarly, other hall- 
marks of obesity, such as oxygen consumption, energy expenditure, 
physical activity, as well as food and drink intake fully returned to 
normal baseline levels upon weight loss during the nadir phase 
(Extended Data Fig. 3g-] and Extended Data Fig. 4a-l). 

In contrast, the composition of the intestinal microbiota, which 
had assumed a dysbiotic state during the primary obesity phase, 
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Figure 2 | Persistent microbiome alterations after weight loss. 

a, Schematic of sampling times for microbiota analysis. b-f, Principal 
coordinate analyses (PCoA) of unweighted UniFrac distances of 
microbiota composition at the indicated time points. g—i, PCoA (g), alpha 
diversity (h), and OTU heatmap (i) of weight-cycling mice and controls 


did not return to its original composition at the time of post-dieting 
weight and metabolic normalization (Fig. 2a—d and Extended Data 
Fig. 5a). Instead, the microbiota assumed an intermediate configura- 
tion between the dysbiotic and normal states (Fig. 2d and Extended 
Data Fig. 5a). A similar microbiome configuration shift was observed 
after recovery from a second cycle of recurrent obesity (Fig. 2e, f and 
Extended Data Fig. 5a). We confirmed these findings by targeting an 
alternative region of the 16S locus for amplicon sequencing (Extended 
Data Fig. 5b-d). Notably, in addition to the significant alteration 
in bacterial composition persisting after metabolic normalization 
(Fig. 2g), bacterial alpha diversity was reduced during the obese state 
and did not recover upon return to normal weight and metabolic 
homeostasis (Fig. 2h). To determine the operational taxonomic units 
(OTUs) that remained altered after dieting, we normalized the OTU 
abundance of weight cycling mice to age-matched NC controls and 
classified the OTUs according to their temporal behaviour. Notably, 
only 45% of all OTUs returned to pre-obesity levels after dieting (Fig. 2i 
and Extended Data Fig. 5e, f), while obesity-induced effects on the 
microbiome persisted in multiple bacterial taxa (Extended Data 
Fig. 5g-l). Similarly, persistent post-obesity microbiota alterations in 
composition and alpha diversity were noted in mice in which weight 
loss had been aided with celastrol treatment (Extended Data Fig. 5j-m). 

To determine the functional consequences of this incomplete 
post-dieting microbiota recovery, we performed shotgun metagenomic 
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sequencing and normalized the temporal behaviour of gene 
abundances to NC controls. We identified 773 bacterial genes whose 
abundance was altered by HFD and did not return to control levels after 
dieting (Extended Data Fig. 6a—c). Likewise, microbial functionalities 
did not fully recover in previously dieting mice, both at the level of gene 
modules (Extended Data Fig. 6d) and functional pathways (Fig. 2)). 
Similar to OTUs, reversal of obesity led only to a partial recovery 
of microbial functions (Extended Data Fig. 6e-h), with multiple 
obesity-induced microbiome aberrations persisting during weight 
loss (Fig. 2k). Abundances of genes from multiple metabolic pathways, 
including isoflavonoid and steroid biosynthesis, were reduced during 
high-fat feeding and did not recover upon dieting (Extended Data 
Fig. 6i, j). Collectively, these data indicate that reversal of obesity by 
dieting results in a microbiome configuration that remains altered, as 
compared to control mice without prior obesity, even when a state of 
metabolic normalization is reached. 


The post-dieting microbiome contributes to weight 
regain 

We next sought to investigate whether the persistent post-obesity 
microbiome signature was causally involved in the metabolic compli- 
cations associated with recurrent weight gain. To this end, we treated 
mice with broad-spectrum antibiotics during the post-obesity weight 
loss period (Fig. 3a). Expectedly, antibiotic treatment during dieting 
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Figure 3 | Post-dieting microbiome alterations drive exacerbated weight 
regain. a, Schematic of microbiota equilibration by antibiotics during 
weight loss. b-d, PCoA of faecal microbiota (b), weight curve (c) and body 
fat content (d) during weight regain of weight-cycling mice undergoing 
antibiotic (Abx) treatment during weight loss. e, Weight curve of mice 
monitored for microbiota equilibration after dieting before induction 

of secondary obesity. f, Microbial dissimilarity expressed as differential 
UniFrac distance over time between cycHFD and HFD mice compared 

to NC controls. g, h, Weight curve (g) and glucose levels after glucose 
tolerance test (h) in germ-free recipients one week after microbiota 
transfer from cycHFD and NC mice. Experiments were repeated twice. 
Data are mean +s.e.m. *P< 0.05 by ANOVA; NS, not significant. 

See Supplementary Tables 5 and 6 for exact n and P values. 


abolished the post-obesity microbiome signature and equilibrated the 
microbiota composition and alpha diversity between previously obese 
mice and controls, while the microbiota of untreated mice maintained 
an intermediate configuration (Fig. 3b and Extended Data Fig. 7a, b). 
Remarkably, antibiotic treatment also abrogated the exacerbation 
of metabolic derangements upon re-exposure to a HED, including 
weight gain, body fat content and glucose intolerance, as compared 
to untreated weight-cycling mice (Fig. 3c, d and Extended Data 
Fig. 7c-e). 

We next determined the time required for spontaneous reversal of 
the persistent post-dieting microbiome alterations. To this end, we 
monitored the microbiota composition every three weeks upon return 
of previously obese mice to phenotypic normality (Fig. 3e). Notably, 
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spontaneous reversion of the post-cycling microbiota composition back 
to NC configuration was achieved only 21 weeks after the completion 
of successful dieting, that is, a time period more than five times 
longer than the initial weight gain or dieting period (Fig. 3f and 
Extended Data Fig. 7f-h). The return to compositional normality in the 
post-obesity microbiota was associated with gradual acquisition or loss 
of bacterial taxa (Extended Data Fig. 7i-k). Importantly, following the 
spontaneous microbiota equilibration, re-exposure to a HFD resulted 
in an indistinguishable weight gain of formerly obese mice compared 
to the primary weight gain of HFD-fed mice (Fig. 3e and Extended 
Data Fig. 71). 

Additionally, we performed faecal transfer experiments, in which the 
microbiota from previously obese mice (cycHFD) and from phenotypi- 
cally identical controls (NC) were transferred to germ-free mice, which 
were subsequently fed either NC or a HFD (Extended Data Fig. 8a). 
The compositional differences between the microbiota from formerly 
weight-cycling mice and controls persisted in germ-free recipients 
(Extended Data Fig. 8b-d). Notably, under NC conditions, naive and 
post-cycling microbiome-transplanted germ-free mice featured similar 
weight and glucose tolerance (Fig. 3g, h and Extended Data Fig. 8e, f), 
suggesting that the post-obesity microbiome did not feature intrinsic 
obesogenic properties. In contrast, when fed a HED, recipients of post- 
weight cycling microbiota exhibited significantly enhanced weight gain 
and glucose intolerance as early as one week after faecal transplantation, 
as compared to recipients of microbiome from NC-consuming mice 
(Fig. 3g, h and Extended Data Fig. 8e, f). Thus, enhanced metabolic 
derangements in cycling microbiome-transplanted HFD-fed germ-free 
mice developed even in the absence of previous bouts of obesity in 
recipient mice, indicating that the post-dieting microbiome configura- 
tion coupled with a secondary obesogenic challenge suffices to induce 
an enhanced metabolic phenotype. Together, these data suggest that 
the post-dieting microbiota contributes to the susceptibility to develop 
aggravated metabolic complications upon re-exposure to obesity- 
inducing conditions. 


Microbiota composition predicts weight regain 

Given the above causal connection between microbiome configuration 
and post-dieting weight regain, we asked whether the extent of recurrent 
weight gain could be computationally predicted for each individual 
mouse based on its microbiota composition at the post-dieting nadir 
period. We therefore profiled the microbiota composition of 25 mice 
that had undergone post-obesity dieting until metabolic normality 
and 25 weight-matched NC controls (Fig. 4a). We first devised a 
machine-learning algorithm, based solely on the microbiota compo- 
sition, aimed at predicting a history of obesity or lack thereof (Fig. 4a, 
see Methods). Notably, the derived random forest classifier predicted 
obesity history nearly perfectly (AUC = 0.96, Fig. 4b). 

We then attempted to predict the exact extent of weight regain of 
each mouse upon secondary exposure to HFD, based on the mouse’s 
post-dieting microbiome configuration. Prediction solely on the 
basis of obesity history (that is, without any machine learning model 
employed) yielded a low prediction accuracy (R= 0.21, Extended 
Data Fig. 8g). By contrast, a 16S rDNA-based prediction using a leave- 
one-out cross-validation scheme performed significantly better 
(R=0.58, Extended Data Fig. 8h). Notably, a two-step microbiome- 
based algorithm that first predicts obesity history and then predicts 
weight regain on the basis of the predicted history, achieved a highly 
accurate prediction of the extent of weight regain across individual 
mice (R=0.72, Fig. 4c). 

To determine which features of the microbiota contribute to the 
algorithm's ability to predict the degree of weight gain, we ranked all 
OTUs detected in the faecal microbiomes according to feature impor- 
tance for prediction (Fig. 4d). Notably, a total of 189 OTUs contributed 
to the algorithm's predictions and the magnitude of their contributions 
displayed a continuum with no OTU standing out as a major contri- 
butor (Fig. 4e). This suggests that the composition of the commensal 
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Figure 4 | Accurate prediction of post-dieting weight regain by 
microbiota features. a, Schematic of microbiota-based prediction of 
weight-gain history and weight regain upon HFD feeding. b, c, Prediction 
of prior obesity (b) and weight regain (c) based on 16S data. d, Ranked list 


bacteria as a whole, rather than a small subset of species, drives the 
post-dieting microbiome alterations that contribute to the susceptibility 
to relapsing obesity. Together, these results indicate that the micro- 
biota configuration may be used to predict the history of HFD-induced 
obesity in mice as well as the extent of personalized weight regain that 
occurs upon recurrence of similar obesity-inducing conditions. 


Metabolites contribute to post-dieting weight regain 

We next determined whether microbiome modulation during the post- 
weight-cycling period could ameliorate the extent of secondary weight 
gain and its metabolic complications. To this end, we performed daily 
faecal microbiome transplantation (FMT) for 4 weeks, using ‘naive or 
post-dieting donor microbiomes transferred into colonized weight- 
cycling mice during the nadir post-dieting period (Fig. 5a). Upon 
transplantation, compositional differences between the microbiota 
from formerly weight-cycling mice and controls persisted in the 
corresponding FMT recipients (Extended Data Fig. 8i-k). Notably, 
recipients of a non-cycling ‘healthy’ microbiome during the nadir post- 
obese period exhibited an ameliorated secondary weight gain (Fig. 5b 
and Extended Data Fig. 81, m), reduced glucose intolerance (Fig. 5c and 
Extended Data Fig. 8n), decreased body fat (Fig. 5d, e), and increased 
lean mass (Extended Data Fig. 80) as compared to mice undergoing a 
control FMT with a post-cycling microbiome. These results indicate that 
restoration of normal microbiota function after dieting may prevent 
exacerbation of metabolic derangements upon weight regain. 

Given the effectiveness of FMT, we sought to gain further insight into 
how microbiota replenishment ameliorates the propensity for recurrent 
obesity after weight loss. To this end, we longitudinally compared the 
faecal metabolomics profiles between mice undergoing HFD-induced 
weight-cycling and control NC-fed mice. We normalized the metabolite 
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levels in weight-cycling mice to those of age-matched NC controls at 
each time point and then classified each metabolite according to tem- 
poral patterns. As expected, HFD induced major alterations in the 
faecal metabolome, which were partially reversed upon subsequent 
weight loss (Fig. 5f, g and Extended Data Fig. 9a-d). However, in nearly 
half of all metabolites altered by HFD, including several bile acids, the 
obesity-induced changes persisted after return to phenotypic normality 
(Fig. 5f, g and Extended Data Fig. 9e-h). 

Among the metabolites most significantly depleted by HFD whose 
levels did not recover upon regain of metabolic health were the dietary 
flavonoids apigenin and naringenin (Fig. 5f-i). Both compounds 
remained suppressed for as long as 15 weeks after weight normalization 
(Fig. 5h, i). Flavonoids are commonly ingested diet-derived compounds 
that are metabolized by the intestinal microbiota’®. The microbiome 
contribution to intestinal flavonoid levels was evident from the elevated 
levels of apigenin and naringenin in antibiotics-treated or germ-free 
mice (Fig. 5j). We therefore hypothesized that a combination of dietary 
flavonoid availability and microbiome-mediated flavonoid-degrading 
capacity may contribute to the total intestinal flavonoid pool. To this 
end, we followed the kinetics of flavonoids, the flavonoid-biosynthetic 
enzyme chalcone synthase and the flavonoid-degrading enzyme flava- 
none 4-reductase (Extended Data Fig. 9i and Supplementary Table 1), 
over the course of an obesity/weight loss/weight regain cycle (Extended 
Data Fig. 10a). 

During primary obesity, low intestinal flavonoid levels (Fig. 5h, i) 
were contributed to by low flavonoid availability in the HFD (Extended 
Data Fig. 10b) coupled to a microbiome shift towards a flavonoid- 
degrading configuration, as evident by the increase in flavanone 
4-reductase levels, and no similar increase in the level of chalcone syn- 
thase (Extended Data Fig. 10c, d). During induction of weight loss by 
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Figure 5 | Microbiome modulation ameliorates post-dieting weight 
regain. a—e, Schematic (a), weight curve (b), glucose level quantification 
after glucose tolerance test (c), body fat content (d), and representative 
MRI scans (e) during weight regain of weight-cycling mice undergoing 
faecal microbiota transplantation (FMT). f, g, Relative abundances (f) and 
heatmap (g) of intestinal metabolites before, during, and after obesity. 
h-k, Normalized intestinal abundances of apigenin and naringenin before, 


reversion to a NC diet, dietary flavonoid availability returned to its 
normally high level (Extended Data Fig. 10b) yet intestinal flavo- 
noid levels remained persistently low, including those of naringenin- 
derived eriodictyol (Fig. 5h, i and Extended Data Fig. 10e). At this 
nadir phase, the flavonoid-degrading microbiome contributed to the 
persistently low flavonoid levels, as suggested by elevated levels of 
flavanone 4-reductase (Extended Data Fig. 10c) and by the effect of 
antibiotic treatment during the weight-reduction phase, which dimin- 
ished the levels of flavonone 4-reductase (Extended Data Fig. 10f) 
and normalized flavonoid levels (Fig. 5k). Upon acute secondary 
induction of obesity (Extended Data Fig. 10g), the combination of low 
dietary flavonoid availability (Extended Data Fig. 10b), coupled with 
the long-standing presence of a flavonoid-degrading microbiome and 
associated low flavonoid levels, probably contributed to reduced flavo- 
noid levels in weight-cycling mice as compared to controls undergoing 
primary weight gain (Extended Data Fig. 10h, i). 

Apigenin and naringenin have been reported to affect food intake, 
adipocyte differentiation, and lipid metabolism! '*. We therefore 
hypothesized that, similarly to FMT-induced restoration of a naive 
flavonoid-degrading microbiome in weight-cycling mice, direct fla- 
vonoid replenishment in these mice may ameliorate the exacerbated 
relapsing obesity phenotype. Indeed, oral daily administration of both 
flavonoids to post-dieting mice during the post dieting nadir and 
secondary weight regain period (Fig. 6a) resulted in normaliza- 
tion of intestinal apigenin and naringenin levels to control levels 
(Extended Data Fig. 10j), with no effect noted on microbiome compo- 
sition (Extended Data Fig. 10k). Similar to FMT, combined flavonoid 


during, and after obesity, as well as 15 weeks after successful dieting 

(h, i), in antibiotic-treated (Abx), germ-free (GF), and specific-pathogen- 
free (SPF) mice (j), and in mice treated with antibiotics during weight loss 
(k). Experiments were repeated twice. Data are mean + s.e.m. *P < 0.05, 
**P < 0.01, ***P < 0.001 by ANOVA (c-j) or Mann-Whitney U-test (k). 
See Supplementary Tables 5 and 6 for exact n and P values. 


treatment ameliorated the rate of secondary weight regain (Fig. 6b, c 
and Extended Data Fig. 101, m). Together, these results suggest that 
low apigenin and naringenin levels in post-dieting mice contribute to 
an exacerbated weight regain, while their therapeutic replenishment 
ameliorates this susceptibility. 


Flavonoids modulate weight regain and UCP1 expression 
To investigate possible mechanisms by which apigenin and naringenin 
ameliorate recurrent post-dieting obesity, we measured metabolic and 
behavioural parameters in flavonoid-administered weight-cycling mice, 
as compared to untreated weight-cycling mice. Notably, weight-adjusted 
energy expenditure was markedly reduced in weight-cycling mice 
(Fig. 6d-f and Extended Data Fig. 11a-f), but was normalized upon 
flavonoid administration (Fig. 6d-f and Extended Data Fig. 11a-f), sug- 
gesting that apigenin and naringenin might impact host energy expend- 
iture. Of note, a similar effect of flavonoids on weight management 
was observed in previous studies!®, while the link between flavonoid 
supplementation and enhanced energy expenditure was reached only 
upon normalization of energy expenditure to body weight. Other met- 
abolic parameters were not affected by flavonoid treatment (Extended 
Data Fig. 11g, h and Extended Data Fig. 12a-f). Similarly, mice treated 
with antibiotics during the nadir period featured enhanced energy 
expenditure upon re-administration of HFD (Fig. 6g, Extended Data 
Fig. 12g-l and Extended Data Fig. 13a-c), in line with higher levels of 
flavonoids (Fig. 5k) and amelioration of the exacerbated weight regain 
(Fig. 3c), while not altering other metabolic parameters during recur- 
rent weight gain (Extended Data Fig. 13d-l). 
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Figure 6 | Metabolite modulation ameliorates post-dieting weight 
regain. a—f, Schematic (a), weight curve (b), representative MRI scans 
(c), representative energy expenditure recording (d), and quantifications 
during the dark phase (night) (e) and light phase (day) (f) of weight- 
cycling mice with or without supplementation of apigenin and naringenin 
(A/N) during weight regain. g, Energy expenditure recording during 
weight regain of weight-cycling mice with or without antibiotic treatment 
before weight regain. h-m, Brown adipose tissue (BAT) Ucp1 transcript 
(h-k) or UCP1 protein (1, m) levels in mice on HFD or NC receiving A/N 


Finally, we pursued possible mechanisms by which flavonoids may 
participate in the regulation of host energy expenditure. Since brown 
adipose tissue (BAT) is a major regulator of thermogenesis in mam- 
mals, and since other members of the flavonoid family have been 
previously associated with the induction of the major thermogenic 
factor uncoupling protein 1 (UCP1)*°’, we analysed Ucp! expression 
in mice fed NC or HFD and orally administered with apigenin and 
naringenin. As early as two weeks after the start of flavonoid treatment, 
Ucp1 transcript levels were significantly elevated in the BAT of mice 
fed a HED, but not in those fed NC (Fig. 6h). Likewise, Ucp1 expres- 
sion was elevated in weight-cycling mice upon apigenin and narin- 
genin supplementation during the HFD-induced weight-regain period 
(Fig. 6i). Ucp1 was also induced by flavonoids in BAT explants in a 
concentration-dependent manner (Fig. 6j), suggesting that there exists 
a direct effect of apigenin and naringenin on the modulation of gene 
expression in BAT. Given that antibiotic treatment elevated intestinal 
flavonoid levels (Fig. 5k) and host energy expenditure (Fig. 6g and 
Extended Data Fig. 12h, 1), we determined the levels of UCP1 in the 
BAT of antibiotic-treated weight-cycling mice. Indeed, we found both 
transcript and protein levels of UCP1 to be elevated in the group receiv- 
ing antibiotics (Fig. 6k-m), providing a potential mechanistic expla- 
nation for recent observations of Ucp1 induction in germ-free mice”’. 

Taken together, these associations suggest a model in which HFD 
promotes the growth of flavonoid-metabolizing bacteria, which in 
turn decrease the amount of bioavailable flavonoids, thereby negatively 
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or vehicle by daily gavage for 2 weeks (h), in weight-cycling mice with or 
without A/N treatment during weight regain (i), in BAT explants cultured 
with A/N for 24h (j), and in mice during weight regain after antibiotic 
treatment during weight loss (k-m). n, Model of diet-microbiota—energy- 
expenditure interactions during dieting and weight regain. Experiments 
were repeated twice. Data are mean +s.e.m. *P < 0.05, **P< 0.01, 

** P< 0.0001 by ANOVA (e, f, h, j) or Mann-Whitney U-test (i, k, m). 
See Supplementary Tables 5 and 6 for exact n and P values. 


regulating UCP1-driven energy expenditure and promoting exagger- 
ated recurrent weight gain (Fig. 6n). Full validation of this model merits 
future studies investigating whether the flavonoid effect on BAT Ucp1 
expression directly drives enhanced energy expenditure. 


Discussion 

In this study, we describe the persistence of an altered microbiome 
configuration following cycles of obesity and dieting, which contrib- 
utes to enhanced metabolic derangements upon weight regain, through 
metabolite-induced effects on host metabolism. We hypothesize that 
diet-induced microbiome persistence may have evolved to act as a 
‘buffer’ contributing to the stability of metabolic homeostasis over 
prolonged periods of time, by preventing overly fluctuating metabolic 
responses to incidental nutritional or environmental signals. However, 
in contexts of erratic changes in host physiology, such as cycling weight 
gain and dieting, this microbiome persistence may predispose the host 
to exaggerated metabolic consequences in ensuing weight-gain cycles. 
Similar contexts of microbiome persistence include its hysteresis-like 
behaviour of reduced reversibility during recurrent dietary changes”, 
or following low-fibre feeding”. 

Our results highlight two potentially interdependent yet inherently 
different microbiota effects on weight and metabolism. First, the micro- 
biota from obese donors induces weight gain in faecal transplanted 
mice even when recipient mice are maintained on a normal chow 
diet®°. This dominant obesogenic property is lost upon remission of 
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obesity. Second, the persistent post-dieting microbiota influences the 
degree of relapsing obesity following weight cycling, but only upon 
encounter of a ‘second hit’ that gives rise to recurrent weight gain. 
While we suggest one mechanism for the common failure of formerly 
obese individuals to maintain long-term reduced weight after dieting, 
the reasons for this failure are probably complex and include contri- 
butions form a multitude of behavioural, genetic, environmental, and 
metabolic factors”®. The findings described here suggest that the remis- 
sion of metabolic derangements after treatment of obesity precedes the 
remission of dysbiosis, with the time window of post-obesity microbi- 
ome persistence marking the susceptibility phase for accelerated recur- 
rent obesity. Corroboration of these findings in dieting humans as well 
as additional variables not reflected in mouse models merit further 
prospective human studies. 

Finally, our study provides an example for how rational post-biotic 
metabolite therapy could serve as a potential means of modulating 
physiological function downstream of the microbiota. As such, we 
found that obesity-induced loss of the flavonoids apigenin and narin- 
genin enhances the susceptibility for accelerated weight regain, poten- 
tially through impairment of energy expenditure, while replenishment 
of these metabolites ameliorated these metabolic abnormalities. Future 
studies are warranted to examine the potential clinical use of flavo- 
noids, as well as modulation of other bioactive metabolites such as bile 
acids that we found to be persistently elevated after dieting, as novel 
therapeutics in the quest for effective long-term weight management 
solutions. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Mice. C57Bl/6 mice were purchased from Harlan and allowed to acclimatize to the 
animal facility environment for 2 weeks before being used for experimentation. In 
each experiment, all mice were littermates born and raised in the same vivarium 
and obtained through a single delivery. All mice were maintained ona strict 12-h 
light-dark cycle (lights turned on at 6 am and turned off at 6 pm) and were housed 
in cages containing a maximum of five animals. Numbers of animals were chosen 
to ensure that a minimum of two distinct cages was used per experimental group 
in each experiment. No statistical methods were used to predetermine sample size. 
Mice were taken out of experiments when wounded as a result of fighting among 
cage-mates. Weights were always measured at the same circadian time throughout 
each experiment. Other than weight and glucose measurements, investigators were 
blinded with regard to experimental groups. Outbred Swiss Webster germ-free 
mice were born in the Weizmann Institute germ-free facility and routinely mon- 
itored for sterility. For faecal transplantation experiments, 100 mg of stool was 
resuspended in 1 ml of PBS under anaerobic conditions, homogenized, and filtered 
through a 70,.m strainer. Recipient mice were gavaged with 200 1] of the filtrate. 

All experiments involving weight cycling employed the following experimental 
paradigm: 

1) Before dietary interventions, mice were randomized to ensure that no 
incidental pre-diet differences in body weight, body fat content, or microbiome 
composition existed between the different groups (Supplementary Fig. 1). 

2) Initial weight gain for 4 weeks (‘during obesity’ time point, 4 weeks), see also 
Supplementary Figs 2 and 3. 

3) Weight loss until lean control levels are reached (end-point criterion for 
weight loss: no statistically significant weight difference between cycling group 
and lean controls; ‘after obesity’ time point). 

4) Weight regain until obese control levels are reached (end-point criterion for 
weight gain: no statistically significant weight difference between cycling group 
and obese controls; ‘second obesity’ time point). 

In all experiments, age-matched male mice were used. In Fig. 1h, female mice 
were used. Mice were 8 weeks of age at the beginning of experiments. For antibiotic 
treatment, mice were given a combination of vancomycin (0.5 g1~!), ampicillin 
(lg 1-4), neomycin (1 g 1-}), and metronidazole (1 g 1}) in their drinking water’. 
Mice were carefully monitored for signs of dehydration upon antibiotic adminis- 
tration. All antibiotics were obtained from Sigma Aldrich and given for the time 
periods indicated in each figure. For flavonoid treatments, apigenin and naringenin 
(obtained from Sigma Aldrich) were dissolved in DMSO and administered daily 
by oral gavage at a concentration of 80 mg kg~'. Controls received vehicle gavages. 
Celastrol was administered daily by intraperitoneal injection of 100j1g kg"! as 
previously described’. Leptin antagonist was administered daily by intraperitoneal 
injection of 25 mg kg! as previously described!’. 

Rodent diets are detailed in the Supplementary Information (Supplementary 
Tables 2 and 3). Stool samples were collected fresh and on the basis of individual 
mice. Fresh stool samples were collected into tubes, immediately snap-frozen in 
liquid nitrogen upon collection, and stored at —80°C until DNA isolation. All 
experimental procedures were approved by the local ACUC. 

Glucose tolerance test. Mice were fasted for 6h and subsequently given 200 jl 
ofa 0.2g ml ! glucose solution (JT Baker) by oral gavage. Blood glucose was deter- 
mined at 0, 15, 30, 60, 90 and 120 min after glucose challenge (Contour blood 
glucose meter, Bayer). 

Magnetic resonance imaging. Mice were anaesthetized with isofluorane (5% for 
induction, 1-2% for maintenance) mixed with oxygen (1 | min~!) and delivered 
through a nasal mask. Once anaesthetized, the animals were placed in a head- 
holder to assure reproducible positioning inside the magnet. Respiration rate was 
monitored and kept throughout the experimental period around 60-80 breaths per 
minute. MRI experiments were performed on 9.4 Tesla BioSpec Magnet 94/20 USR 
system (Bruker) equipped with gradient coils system capable of producing pulse 
gradient of up to 40 gauss cm! in each of the three directions. All MR images were 
acquired with a quadrature resonator coil (Bruker). The MRI protocol included two 
sets of coronal and axial multi-slices T2-weighted MR images. The T2-weighted 
images were acquired using the multi-slice RARE sequence (TR= 2,500 ms, 
TE=35 ms, RARE factor = 8), with matrix size being 256 x 256, four averages, 
corresponding to an image acquisition time of 160s per set. The first set was used 
to acquire 21 axial slices with 1-mm slice thickness (no gap). The field of view 
was selected with 4.2 x 4.2cm*. The second set was used to acquire 17 coronal 
slices with 1-mm slice thickness (no gap). The field of view was selected with 
7.0 x 5.0cm”. Total fat and lean mass of mice were quantified by EchoMRI-100 
(Echo Medical Systems). 

Metabolic measurements. Food intake and locomotor activity were measured 
using the PhenoMaster system (TSE-Systems), which consists of a combination 
of sensitive feeding sensors for automated measurement and a photobeam-based 


activity monitoring system detects and records ambulatory movements, including 
rearing and climbing, in each cage. All parameters were measured continuously 
and simultaneously. Mice were trained singly housed in identical cages before 
data acquisition. 

Triglycerides, total cholesterol and high-density lipoprotein (HDL) levels were 
measured in mouse serum by SpotChem EZ Chemistry Analyzer (Arkray). LDL 
levels were calculated using the Friedewald formula. 

Concentrations of leptin (Mouse Leptin DUO set, R&D Systems) and insulin 
(Ultra-sensitive mouse insulin ELISA kit, Crystal Chem) in the serum were meas- 
ured using ELISA according to the manufacturer's instructions. 

Taxonomic microbiota analysis. Frozen faecal samples were processed for 
DNA isolation using the MoBio PowerSoil kit according to the manufactur- 
er’s instructions. 1 ng of purified faecal DNA was used for PCR amplification. 
Amplicons spanning the variable region 1/2 (V1/2) of the 16S rRNA gene were 
generated by using the following barcoded primers: forward, 5/-XXXXXXXXAGAG 
TTTGATCCTGGCTCAG-3’; reverse, 5’-TGCTGCCTCCCGTAGGAGT-3’, 
where X represents a barcode base. Amplicons spanning the variable region 3/4 
(V3/4) of the 16S rRNA gene (Fig. 2h and Extended Data Fig. 5b-d) were generated 
by using the following primers: forward, 5‘-GITGCCAGCMGCCGCGGTAA-3/; 
reverse 5'- GGACTACHVGGGTWTCTAAT-3’. The reactions were subsequently 
pooled and cleaned (PCR clean kit, Promega), and the PCR products were then 
sequenced on an Illumina MiSeq with 500-bp paired-end reads. The reads were 
then processed using the QIIME (Quantitative Insights Into Microbial Ecology, 
http://www.qiime.org) analysis pipeline as described**-*°, In brief, fasta quality 
files and a mapping file indicating the barcode sequence corresponding to each 
sample were used as inputs, reads were split by samples according to the barcode, 
taxonomical classification was performed using the RDP-classifier, and an OTU 
table was created. OTU mapping was employed using the Greengenes database. 
Rarefaction was used to exclude samples with insufficient count of reads per 
sample. Sequences sharing 97% nucleotide sequence identity in the 16S region 
were binned into operational taxonomic units (97% ID OTUs). For beta diversity, 
unweighted UniFrac measurements were plotted according to the two principal 
coordinates based on >10,000 reads per sample. For microbial distance measure- 
ments, unweighted UniFrac distances were compared. 

Classification of obesity history. Mouse obesity history was predicted using 
Random Forest Classification (sklearn 0.15.2) with the features being the rela- 
tive abundances of 16S OTUs as outputted by QIIME. Classification was made in 
leave-one-out cross-validation in which each mouse was classified as negative or 
positive for obesity history. 

Prediction of weight regain following HFD diet. Future weight gain of mice was 
predicted in leave-one-out cross-validation, whereby the weight regain of each 
mouse upon HED exposure was predicted using a gradient-boosting regression 
algorithm using data of all other mice consisting of their 16S rDNA OTU data at 
the post-dieting nadir period, their obesity history, and their individual weight 
regain upon HED exposure. Importantly, each time a mouse was left out and its 
weight regain was predicted, its obesity history was not given as an input to our 
algorithm. For each left out mouse, a classifier of obesity history was first learned 
and used to classify the obesity history of the left out mouse as described above. 
Then, training data mice with the same obesity history as the left out mouse were 
taken, and gradient-boosting regression (GBR, sklearn 0.15.2) was applied to learn 
a model that predicts their weight regain on the HFD diet. Input to this model 
consists of the 16S OTUs which were used within the GBR algorithm to predict 
weight regain. 

Metagenomic analysis. Metagenomic reads containing Illumina adapters were 
filtered for exclusion of low-quality reads and trimmed low-quality read edges. 
Host DNA was detected and excluded by mapping with GEM to the mouse genome 
with inclusive parameters. Length-normalized RA of genes, obtained by similar 
mapping with GEM to a reference catalogue*!, was assigned to KEGG Orthology 
(KO) entries*”, and these were then normalized to a sum of 1. RA of KEGG 
modules and pathways was calculated by summation. Only samples with > 100,000 
metagenomics reads were considered for analysis. 

Quality control of metagenomic reads and removal of host DNA. We applied 
Trimmomatic*? with the following parameters: 

ILLUMINACLIP:< Trueseq3 adapters FASTA file>:2:30:10 LEADING:25 
TRAILING:25 MINLEN:50. We removed host DNA by mapping to the mouse 
genome (mm10, downloaded from https://genome.ucsc.edu) and removing any 
mapped reads (see section below). 

Mapping of metagenomic sequencing reads. Mapping was performed using the 
GEM mapper™ with the following parameters: 

-q offset-33-gem-quality-threshold 26 -m 0.1 -e 0.1-min-matched-bases 
0.8-max-big-indel-length 15 -s 3 -d ‘al? -D 1 -v -T 2 -p -E 0.3-max-extendable- 
matches ‘all-max-extensions-per-match 5 
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With the addition of the modifier -m set to 0.05 for mapping to the mouse 
genome. Resulted mappings were retained as long as they had at least 50 matched 
bases with minimal quality of 26. 

Genetic content relative abundance calculation. Reads were mapped to the 
integrated reference catalogue of the human gut microbiome”". For each gene in 
the catalogue, the fraction of reads mapped to it from each sample was counted 
and normalized by gene length in kilobases. Reads mapping to more than one 
location were split so that each location received an equal fraction of the mapped 
read. Mapped reads were subsequently assigned to KEGG Orthology (KO) 
entries using the gene annotation table available at http://meta.genomics.cn/. 
Relative gene abundances were then calculated by normalizing the KEGG genes 
of each sample to sum to 1. To calculate the abundances of KEGG pathways 
and modules, the relative abundance of genes in each pathway and module was 
summed. 

Non-targeted metabolomics. Caecal samples were collected, immediately frozen 
in liquid nitrogen and stored at —80°C. Sample preparation and analysis was per- 
formed by Metabolon Inc. Samples were prepared using the automated MicroLab 
STAR system (Hamilton). To remove protein, dissociate small molecules bound 
to protein or trapped in the precipitated protein matrix, and to recover chemi- 
cally diverse metabolites, proteins were precipitated with methanol. The resulting 
extract was divided into five fractions: one for analysis by UPLC-MS/MS with 
positive ion mode electrospray ionization, one for analysis by UPLC-MS/MS 
with negative ion mode electrospray ionization, one for LC polar platform, one 
for analysis by GC-MS, and one sample was reserved for backup. Samples were 
placed briefly on a TurboVap (Zymark) to remove the organic solvent. For LC, 
the samples were stored overnight under nitrogen before preparation for analysis. 
For GC, each sample was dried under vacuum overnight before preparation for 
analysis. 

Data extraction and compound identification. Raw data was extracted, peak- 
identified and QC processed using Metabolon’s hardware and software. 
Compounds were identified by comparison to library entries of purified standards 
or recurrent unknown entities. 

Metabolite quantification and data normalization. Peaks were quantified using 
area-under-the-curve. For studies spanning multiple days, a data normalization 
step was performed to correct variation resulting from instrument inter-day tuning 
differences. 

Flavonoid measurements. Apigenin and naringenin were measured by Waters 
TQ MS detector combined with Waters Acquity UPLC system. The chromato- 
graphic separation was carried out on a BEH C18 column (1.7 1m, 2.1 x 100mm, 
Waters). The solvent flow rate was 0.3 ml min~!. The mobile phase consisted of 
0.1% formic acid (FA) in 5% acetonitrile (A) and 0.1% FA in acetonitrile (B) using 
a gradient program described below. The autosampler was cooled to 12°C and 
the column heated to 35°C. MS detector (Waters TQ) was equipped with an ESI 
source used in positive mode (capillary voltage 2.5 kV). The measurement was 
performed in MRM mode, two MRM traces for each compound (one for quan- 
tification, and the second for identification). The cone voltages (V) and collision 
energies (eV) for each MRM transition, as determined by direct injection, are 
summarized below. Data were processed with MassLynx software with TargetLynx 
(version 4.1, Waters). 

For sample preparation, 50 mg of stool or 100 mg food were weighed into 2-ml 
safe-lock Eppendorf tubes. Samples were homogenized using a beadbeater with 
metal balls. Three-hundred micrograms of 80% methanol in DDW were added 
to the samples, followed by sonication for 20 min, centrifugation, and filtering 
through Acrodisc PTFE 0.2 1m filters (P/N 4552T) into vials. 

See Supplementary Table 4 for chromatographic conditions for flavonoid 
separation. 
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Gene expression analysis. Tissues were preserved in RNAlater solution (Ambion) 
and subsequently homogenized in Tri Reagent (Sigma Aldrich). RNA was purified 
using standard chloroform extraction. Two micrograms of total RNA was used 
to generate cDNA (HighCapacity cDNA Reverse Transcription kit; Applied 
Biosystems). Real-time PCR was performed using the following Ucp1 primers: 
Ucp1 forward, 5'‘-GGCCTCTACGACTCAGTCCA-3’; Ucp1 reverse, 5’/-TAA 
GCCGGCTGAGATCTTGT-3’. Primers for flavanone 4-reductase and chalcone 
were tested using http://insilico.ehu.eus/PCR/ and validated using cultures of 
Lactococcus lactis and Escherichia coli. 

PCR was performed using Kapa Sybr qPCR kit (Kapa Biosystems) on a Viia7 
instrument (Applied Biosystems). PCR conditions were 95°C for 205, followed by 
40 cycles of 95°C for 3s and 60°C for 30s. Data were analysed using the AA“ 
method with Hprt serving as the reference housekeeping gene. Hprt cycles were 
assured to be insensitive to the experimental conditions. 

Western blot analysis. Brown adipose tissue samples were excised and washed 
thoroughly with PBS, homogenized in RIPA buffer containing protease inhibitors, 
incubated for 20 min in 4°C and centrifuged for 20 min, 14,000 r.p.m., at 4°C. 
Samples were separated on 12% acrylamide gels and transferred onto nitrocellu- 
lose membranes. Western blot analysis was performed using anti- UCP1 (M-17) 
polyclonal antibody (Santa Cruz, sc-6529) and donkey anti-goat antibody (Jackson 
ImmunoResearch, 705-035-003). Band density was calculated using ImageJ soft- 
ware. See Supplementary Fig. 4 for immunoblot source data. 

Adipose tissue explants. Brown adipose tissue was excised and rinsed with PBS. 
The tissue explants were cultured with 0, 10 or 100|1M of apigenin and narin- 
genin for 24h in DMEM medium containing 10% FBS, L-glutamine, penicillin, 
and streptomycin at 37°C. Explants were then collected and immediately processed 
for qPCR and western blot analysis as described above. 

Statistical analysis. Data are expressed as mean + s.e.m. Comparisons between two 
groups were performed using two-tailed Mann-Whitney U-test. ANOVA was used 
for comparison between multiple groups. Statistical testing was performed using 
GraphPad Prism software. K-means clustering based on Pearson's correlation was 
used to classify the temporal behaviour of OTUs, metagenomes, and metabolites 
in weight-cycling mice after normalization to control mice in order to account for 
ageing-induced changes. P values <0.05 were considered significant. *P < 0.05; 
**P < 0.01; ***P< 0.001; ****P < 0.0001. Exact P values for each experiment can 
be found in Supplementary Table 6. 

Data availability. The sequencing data has been deposited at the European 
Nucleotide Archive database with the accession number PRJEB17697 and are 
available from the corresponding authors upon request. 
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Extended Data Figure 1 | Metabolic measurements during weight 
regain. a, Schematic indicating time point of metabolic measurements. b, 
Quantification of weight regain by weight gain. c, Net weight gain induced 
by 8 weeks of HFD in weight-cycling mice and continuous HED. d, e, 
Coronal (above) and axial (below) MRI scans (d), and quantification of 
body fat content (e). f-h, Serum levels of leptin (f), LDL (g), and HDL (h) 
during the second HFD exposure of mice undergoing weight cycling and 
controls. i, j, Quantification of dark phase (i) and light phase (j) 
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energy expenditure upon weight regain of weight-cycling mice. 

k-gq, Representative recordings (k, n, q) and quantifications (1, m, 0, p) of 
O2 consumption (k-m), CO consumption (n-p), and food intake (q) 
upon weight regain of weight-cycling mice. Experiments were repeated 
twice. Shown are mean +s.e.m. *P < 0.05; **P< 0.01; ***P < 0.001; 
**E P < 2.0001 by ANOVA (e-p) or Mann-Whitney U-test (b, c). 

See Supplementary Tables 5 and 6 for exact n and P values. 
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Extended Data Figure 2 | Enhanced recurrent weight gain after initial 
obesity. a, b, The effect of celastrol on weight loss in mice continuously 
fed a HFD (a) and mice with alternating diets (b). c-e, Weight regain 
quantification by AUC (c), weight regain slope (d,), and net weight gain on 
HED (e) by control mice and weight-cycling mice treated with celastrol to 
lose weight. f, g, Quantification of weight regain by AUC (f), weight regain 
slope (g) of leptin antagonist-treated weight-cycling mice and controls. 

h, i, Body fat content (h), and serum cholesterol levels (i) of weight-cycling 
mice undergoing a third weight cycle and controls. j, Schematic of the 


analysed time point in k-o. k, Weight gain during 4 weeks of HFD. 

1-0, Glucose levels after glucose tolerance test (1), glucose level 
quantification (m), serum cholesterol levels (n), and quantification of 
body fat content (0) in weight-cycling mice during initial obesity and 
controls. Experiments were repeated twice. Shown are mean +s.e.m. 
*P<0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001 by ANOVA (c, f, h, 
i, m, n, 0) or Mann-Whitney U-test (d, e, g). See Supplementary Tables 5 
and 6 for exact n and P values. 
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Extended Data Figure 3 | Recovery of metabolic parameters after 

dieting. a, Schematic of the analysed nadir time point. b-f, Body fat 
content (b), serum cholesterol levels (c), glucose levels after glucose 

tolerance test (d), glucose level quantification (e), and serum insulin 
levels (f) in weight-cycling mice upon return to normal weight. 
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mean +s.e.m. n.s., not significant by ANOVA. See Supplementary Table 5 
for exact n values. 
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Extended Data Figure 4 | Metabolic measurements after dieting. 
Representative recordings (a, d, g, j), dark phase (b, e, h, k), and light 
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See Supplementary Table 5 for exact n values. 
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Extended Data Figure 5 | Persistent microbiome changes after dieting. 
a, Quantification of UniFrac distances of weight-cycling mice from NC 
controls at the indicated time points. Plots correspond to the PCoA 
analyses in Fig. 2b-f. b-d, PCoA analyses and distance quantification 
(insets) of V3/V4-targeted 16S sequencing of mice before (b), during (c), 
and after (d) diet-induced obesity and subsequent weight loss. 

e-i, Examples of OTUs whose abundance does (e, f) or does not (g-i) 


recover after dieting. j-m, PCoA analyses (j, k), UniFrac distance (1) and 
alpha diversity (m) at the nadir time point of post-dieting mice that had 
received celastrol to accelerate weight loss. Experiments were repeated 
twice. Data are mean +s.e.m. *P < 0.05, **P< 0.01, ***P< 0.001, 
**P < 0.0001 by ANOVA. See Supplementary Tables 5 and 6 for exact 
nand P values. 
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Extended Data Figure 6 | Persistent metagenomic changes after 
dieting. a, Heatmap of normalized gene abundance in the microbiota 

of mice before, during, and after obesity. b, c, Examples of genes whose 
abundance does not recover after dieting. d, PCA of bacterial KEGG 
modules over time in weight-cycling mice and controls. e-j, Examples of 
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KEGG pathways whose abundance is reversibly decreased (e, f), reversibly 
increased (g, h), or persistently decreased (i, j) during obesity and dieting. 
Data are from one experiment. Data are mean +s.e.m. *P < 0.05, 

**P < 0.01, ***P < 0.001, ****P < 0.0001 by ANOVA. See Supplementary 
Tables 5 and 6 for exact n and P values. 
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recurrent obesity. a, b, PCoA (a) and alpha diversity (b) of faecal 
microbiota after dieting (nadir time point at week 8) from mice with or 
without antibiotic treatment during weight loss. c, Net weight gain 
induced by 8 weeks of HFD in weight-cycling mice or continuous HED 
control with or without antibiotic treatment between weeks 4 and 8. 
d, e, Glucose levels after oral glucose tolerance test (d) and glucose level 
quantification after glucose tolerance test (e) during weight regain in mice 
with or without antibiotic treatment during weight loss (n = 4-10). 
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f-h, PCoA of faecal microbiota from formerly obese mice and controls 
at the time of dieting-induced weight normalization (f), 15 weeks after 
weight normalization (g), and 21 weeks after weight normalization (h). 
i-k, Correlation analysis (i) and examples (j, k) of microbial taxa 
undergoing gradual normalization in abundance over a period of 

21 weeks after weight normalization. 1, Quantification of secondary 
weight gain after microbiota normalization. Experiments were repeated 
twice. Data are mean +s.e.m. **P < 0.01, ***P< 0.001 by ANOVA. 

See Supplementary Tables 5 and 6 for exact n and P values. 
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Extended Data Figure 8 | Transfer, prediction and treatment of weight 
regain by microbiome features. a, Schematic of microbiota transfer to 
germ-free mice after dieting. Recipients were fed either a HFD or NC. 
b-d, PCoA of recipient microbiota (b) and relative UniFrac distances to 
NC controls (c, d) of germ-free mice one week after transplantation with 
microbiota from weight-cycling mice or controls, and fed either NC (c) or 
a HED (d). e, f, Quantifications of weight gain (e) and blood glucose after 
glucose tolerance test (f) in germ-free recipients of microbiota from weight 
cycling mice or controls. g, h, Correlation of predicted and measured 
weight gain when prediction is based solely on inferred history of 


obesity (g) or solely on 16S sequencing (h). i-k, PCoA of faecal microbiota 
(i) and relative UniFrac distances between donors (j) and recipients (k) 
two weeks after the onset of daily FMT from cycHFD or NC mice to mice 
undergoing weight cycling. l-o, Quantification of secondary weight gain 
(1), net weight gain induced by 8 weeks of HFD feeding (m), glucose levels 
after glucose tolerance test (n) and lean mass (0) in weight-cycling mice 
and controls with or without FMT. Experiments were repeated twice. Data 
are mean +s.e.m. n.s., not significant. *P < 0.05, **P < 0.01, 

* D < (1.0001 by ANOVA (e, f, 1, m, 0) or Mann-Whitney U-test 

(c, d, j, k). See Supplementary Tables 5 and 6 for exact n and P values. 
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Extended Data Figure 9 | Persistent metabolomic changes after 

dieting. a~h, Examples of metabolites whose abundance is reversibly 
decreased (a, b), reversibly increased (c, d), persistently decreased (e, f), 
or persistently increased (g, h) during obesity and dieting. i, Schematic of 
flavonoid biosynthetic pathways leading to the production and conversion 


of naringenin. KEGG IDs of key enzymes are indicated. Genes found 

in our metagenomic dataset are indicated in green. Data are from one 
experiment. Shown are mean +s.e.m. *P< 0.05, **P< 0.01, ***P< 0.001, 
**P < 0.0001 by ANOVA. See Supplementary Tables 5 and 6 for exact n 
and P values. 
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Extended Data Figure 10 | Microbiota control of post-dieting 

metabolic complications through intestinal flavonoids. a, Schematic 
showing sampling times in obesity/recovery cycle experiment. b, Dietary 
flavonoids in NC and HED. c-e, Abundance of flavanone 4-reductase (c), 
chalcone synthase (d), and eriodictyol (e) over time in the faeces of mice 
undergoing weight cycling normalized to controls. f, Quantification of 
flavanone 4-reductase levels in faecal DNA relative to host DNA in weight- 
cycling mice at the end of the weight loss period, with or without antibiotic 
treatment during weight loss. g, Schematic of sampling time upon weight 
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regain. h, i, Abundance of apigenin (h) and naringenin (i) in the faeces of 
mice undergoing post-dieting weight regain and controls. j-m, Flavonoid 
levels (j), PCoA of faecal microbiota (k), net weight gain induced by 8 
weeks of HFD (1), and weight regain quantification by AUC (m) of weight- 
cycling mice supplemented with apigenin and naringenin during the 
weight regain. Data are from one (a-k) or two (1, m) experiments. Shown 
are mean +s.e.m. *P < 0.05, **P< 0.01 by ANOVA (c, e, 1) or Mann- 
Whitney U-test (h, m). See Supplementary Tables 5 and 6 for exact 

nand P values. 
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Extended Data Figure 11 | Metabolic measurements in flavonoid- 
treated mice. Representative recordings (a, d, g, j), dark phase 
quantifications (b, e, h, k), and light phase quantifications (c, f, i, 1) of O2 
consumption (a—c), CO2 consumption (d-f), respiratory exchange ratio 
(g-i), and physical activity (j-l) of weight-cycling mice with or without 


supplementation of apigenin and naringenin (A/N) during weight regain. 
Data are from one experiment. Data are mean +s.e.m. **P< 0.01, 

***P < 0.001 by ANOVA. See Supplementary Tables 5 and 6 for exact 
nand P values. 
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Extended Data Figure 12 | Metabolic measurements in flavonoid- and 
antibiotic-treated mice. Representative recordings (a, d), dark phase 
quantifications (b, e), and light phase quantifications (c, f) of food (a-c) 
and water consumption (d-f), of weight-cycling mice with or without 
supplementation of apigenin and naringenin (A/N) during weight regain. 
g, Schematic indicating time of metabolic measurements during the 
weight regain phase. h, i, Quantifications of energy expenditure in 
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ARTICLE 


weight-cycling mice with or without antibiotic treatment during weight 
loss. j-l, Representative recording (j) and quantifications (k, l) of O, 
consumption by weight-cycling mice with or without antibiotic treatment 
(Abx) during weight loss. Data are from one experiment. Shown are 
mean + s.e.m. *P < 0.05, ***P < 0.001 by ANOVA. See Supplementary 


Tables 5 and 6 for exact n and P values. 
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Extended Data Figure 13 | Metabolic measurements in antibiotic- 
treated mice. Representative recordings (a, d, g, f), dark phase 
quantifications (b, e, h, k), and light phase quantifications (c, f, i, 1) of COz 
consumption (a-c), respiratory exchange ratio (d-f), physical activity 
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(g-i) and food intake (j-1) by weight-cycling mice with or without 
antibiotic treatment (Abx) during weight loss. Data are from one 
experiment. Data are mean + s.e.m. ***P < 0.01 by ANOVA. 

See Supplementary Tables 5 and 6 for exact n and P values. 
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Gundula Haunschild', Miodrag Guzvi¢!, Christian Reimelt!, Michael Grauvogl°, Norbert Eichner®, Florian Weber’, 

Andreas D. Hartkopf®, Florin-Andrei Taran’, Sara Y. Brucker®, Tanja Fehm”, Brigitte Rack!°, Stefan Buchholz", Rainer Spang’, 


Gunter Meister®, Julio A. Aguirre-Ghiso® & Christoph A. Klein!? 


Accumulating data suggest that metastatic dissemination often occurs early during tumour formation, but the 
mechanisms of early metastatic spread have not yet been addressed. Here, by studying metastasis in a HER2-driven 
mouse breast cancer model, we show that progesterone-induced signalling triggers migration of cancer cells from early 
lesions shortly after HER2 activation, but promotes proliferation in advanced primary tumour cells. The switch from 
migration to proliferation was regulated by increased HER2 expression and tumour-cell density involving microRNA- 
mediated progesterone receptor downregulation, and was reversible. Cells from early, low-density lesions displayed 
more stemness features, migrated more and founded more metastases than cells from dense, advanced tumours. Notably, 
we found that at least 80° of metastases were derived from early disseminated cancer cells. Karyotypic and phenotypic 
analysis of human disseminated cancer cells and primary tumours corroborated the relevance of these findings for 


human metastatic dissemination. 


Systemic cancer (the dissemination and subsequent distant outgrowth 
of cells from a solid tumour) occurs in two phases: a clinically latent 
stage of hidden cancer spread, and then manifest metastasis. Manifest 
metastasis remains mostly incurable. The period of clinically unde- 
tectable minimal residual disease, defined by disseminated cancer cells 
(DCCs) that are left behind after primary tumour surgery, offers a time 
window to prevent metastasis)”. However, only circumstantial know- 
ledge is available about minimal residual disease, and consequently, 
systemic (adjuvant) therapies improve outcome in only about 20% of 
patients**, This situation indicates that our current understanding of 
early systemic cancer is insufficient to prevent metastasis. 

The first direct evidence for a characteristic biology of early dis- 
seminated cancer and minimal residual disease came from analyses 
of DCCs isolated from bone marrow of patients with breast cancer 
before (MO stage, according to Union for International Cancer Control 
guidelines) and after (M1 stage) manifestation of metastasis>”, indi- 
cating that MO-DCCs might have disseminated early and evolved 
in parallel with the primary tumour’. Studies in transgenic mouse 
models*"!° and in patients with pre-malignant lesions or in situ 
carcinomas®"!? corroborated this concept but the relevance of DCCs 
remains contested’. 

We therefore addressed the issue of breast-cancer-cell dissemina- 
tion soon after cancer initiation and investigated whether mechanisms 
exist that reduce metastatic seeding from advanced cancers. Finally, we 
investigated whether early DCCs (eDCCs) are able to form metastases. 
We report on a mechanism involving cell density, HER2 and PGR 
signalling that reconciles early and late dissemination models. 


PGR and HER? regulate gene expression in early lesions 
In BALB-NeuT mice, dissemination starts shortly after expression of the 
Her2 transgene (also known as Erbb2) at puberty (around 4 weeks of age), 
when the first hyperplastic lesions become apparent®. From 4—9 weeks of 
age, we observed micro-invasion®, anda sharp decline in the ratio of DCCs 
to total tumour area (a measure of cell numbers at risk of dissemination) 
during primary tumour growth (Extended Data Fig. 1a). The genetic 
program governing dissemination from early lesions in microdissected 
tissue samples (Extended Data Fig. 1b and Supplementary Table 1) 
showed a signature gene expression profile compared to healthy mam- 
mary glands, primary tumours and lung metastases (Fig. 1a). We defined 
1,278 gene transcripts unique to early lesions of which 300 were highly 
conserved between mouse and human (Supplementary Data 1). 

We confirmed differential expression of selected transcripts by quan- 
titative PCR (qPCR) (Extended Data Fig. 1c) and analysed transcript 
expression of steroid hormone receptors (strong candidate regulators; 
Supplementary Tables 2, 3), all of which, except Esr1 (also known as 
ERalpha), showed the highest expression in early lesions (Extended 
Data Fig. 1d). When we assessed the expression of Ahnak, Baz2a, 
Nfatc3, Nr3c1 and Nr3c2 genes, which were used as surrogate markers 
of the early lesion signature (Fig. 1b, Supplementary Tables 2, 3 and 
Supplementary Data 1), progesterone was the only steroid hormone 
that activated a similar expression profile (Extended Data Fig. le), 
but only in early lesion cells of 9-week-old BALB-NeuT mice, and not 
in wild-type mammary or primary tumour cells (Fig. 1c). Moreover, 
the expression of progesterone receptor B (PGR-B), which is the main 
isoform expressed for mammary gland development'*"!%, correlated 
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Figure 1 | Identification of a gene expression signature linked to early 
dissemination. a, Heatmap of genes that were differentially expressed 
in different sample types: normal mammary glands from BALB/c 
(N1-N10), early lesions (EL, Al-A7), primary tumours (PT, P1-P19) 
and lung metastases (Met, M1-M7) from BALB-NeuT mice. Yellow, 
upregulation; blue, downregulation. b, Five-gene surrogate signature 
(qPCR) for the early lesion profile. c, Progesterone (P) activates the 
early lesion signature in vitro. Data are mean + s.d. d, TissueFAX 


with low to moderate HER2 expression in early lesions and wild-type 
mammary tissue (Extended Data Fig. 1f). We therefore quantified 
HER2 and PGR protein expression in mammary cells displaying 
normal, hyperplastic or advanced tumour morphology. The number 
and staining intensity of HER2* cells steadily increased with advancing 
tumour morphology, in which early lesions had intermediate cell 
numbers and HER2 expression levels. PGR staining intensity was 
constant, but the number of PGR? cells declined from approximately 
40% in normal glands to 0% in primary tumours (Fig. 1d). 

The early lesion signature could be activated in 4T1, an MMTV- 
HER2-negative mouse tumorigenic cell line (Extended Data Fig. 1g) 
that displays weak endogenous HER2 expression (Extended Data 
Fig. 1h). Furthermore, HER2-negative mouse tumour cells (67NR and 
MM3M.G cells) did not express the early lesion signature (Extended 
Data Fig. 1g), but progesterone treatment or transduction of the B 
isoform of PGR (Pgr-B) induced upregulation of HER2 in 4T1 or 
MM3MG cells, respectively (Extended Data Fig. 1h-i). Collectively, 
these results suggested that the genetic program of early lesions depends 
on the combined activation of progesterone and HER2 pathways. 


Progesterone induces migration of early lesion cells 
Because progesterone mediates branching" in mammary gland devel- 
opment, we investigated the role of the progesterone-induced early 
lesion signature in cancer cell migration. We found that the mRNA 
levels of the progesterone-induced paracrine signals (PIPS) Rankl (also 
known as Tnfsfl1) and Wnt4 were upregulated in early lesion samples 
(Extended Data Fig. 2a). Treatment of early-lesion-derived cells with 
PIPS mimicked the effect of progesterone (Extended Data Fig. 2b), 
suggesting that early lesions exploit the mechanisms of mammary 
branching for metastasis. Consistent with this, PGR cells were 
enriched in distal ducts of normal mammary glands (advancing the 
branching tree away from the nipple during developmental fat pad 
invasion) compared to proximal ducts closer to the origin (more 
differentiated ducts; Extended Data Fig. 2c, d). 

Furthermore, progesterone and PIPS induced migration of mam- 
mary cells from early-lesion-derived samples (freshly prepared or 
early-lesion-derived mammospheres) and suppressed migration in 
cells from primary tumours (Fig. 2a and Extended Data Fig. 2e, f). 


BALB-NeuT 


BALB/c transformed lesion tumour transformed lesion tumour 


cytometric quantification of HER2 and PGR protein expression. Images, 
representative staining of HER2 (left) and PGR (right). Scale bars, 100 1m. 
Mean HER? staining intensity (red line, left histograms) in arbitrary units 
(a.u.) and percentage of PGR* cells (right histogram) and box plots. Boxes 
show lower quartile, median and upper quartile and whiskers indicate 
minimum and maximum; ****P < 0.0001 (t-test and Stouffer's combined 
probability test (c) and one-way ANOVA(d)). 


PIPS also activate mammary stem cells during mammary gland 
development’’, prompting us to analyse mammosphere formation’®. 
Consistent with previous reports on HER2-stimulated stemness'””” 
BALB-NeuT-derived samples generated significantly higher numbers of 
spheres than controls (Fig. 2b). Notably, normal mammary cells derived 
from young (4—9-week-old) mice generated more spheres than cells 
derived from older mice (Fig. 2b). Furthermore, early-lesion-derived 
cells generated higher sphere numbers in response to progesterone and 
PIPS than primary tumour samples (P < 0.01; Fig. 2c). Neutralizing 
antibodies against RANKL and a WNT inhibitor abrogated the effect of 
progesterone on early lesion cell migration and sphere formation 
(Fig. 2d). 

Migrating early lesion cells (that is, those arriving on the other side of 
the transwell membrane) formed increased sphere numbers in response 
to progesterone (Fig. 2e and Extended Data Fig. 2g). Oestrogen also 
induced migration and sphere formation, however, the progesterone 
inhibitor RU486 inhibited this pro-migratory and pro-sphere-forming 
effect (Extended Data Fig. 2h), possibly because oestrogen acts via transcrip- 
tional induction of Pgr*!. Together, these results suggested that moderate 
HER2 expression with progesterone or PIPS availability promote 
sphere-formation and migratory responses in mammary epithelial cells. 


HER2 expression levels determine cellular responses 

We next performed a series of mechanistic in vitro experiments using 
mouse mammary epithelial MM3MG cells (oestrogen receptor (ERa)- 
negative but expressing low levels of PGR and HER2). MM3MG cells 
transduced with Pgr-B or Her2 were subjected to sphere-formation and 
migration assays (Extended Data Fig. 3). PIPS-responsive cells (in both 
migration and sphere-formation assays) were HER2!°“/PGR-, whereas 
PGR’ cells themselves were not migrating and HER2™" cells were 
similar to non-migrating primary tumour cells and showed enhanced 
proliferation. 


Cell density regulates HER2 and PGR expression 

Notably, individual primary-tumour-derived cells re-expressed Pgr 
mRNA and protein when cultured at low cellular densities (Extended 
Data Fig. 4a, b). We therefore plated the BALB-NeuT primary-tumour- 
derived TUBO cell line at different cell densities and found PGR 
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Figure 2 | Progesterone induces migration and sphere formation of 
early lesion cells. a, Early lesion and primary tumour cells respond to 
progesterone or PIPS (WNT4, RANKL) with increased (early lesions) 

or decreased (primary tumours) migration. b, Mammosphere formation 
depends on age and HER2 expression. c, Cells from early lesions and 
primary tumours respond to progesterone or PIPS with increased 


expression in 10+5% (mean + s.d.) of cells grown at low density, 
but undetectable PGR expression in cells grown at high density 
(Fig. 3a). Several experiments suggested that there was a soluble factor 
with PGR-suppressing activity in the vesicular fraction of cell culture 
supernatants, and that this activity was conserved between mouse 


(early lesions) or decreased (primary tumours) sphere formation. 

d, Depletion of PIPS by IWP-2 (WNT inhibitor) or anti-RANKL 
(neutralizing antibody) reduces migration (left) and sphere formation 
(right) of early lesion cells. e, Migrating cells activated by PIPS form 
spheres (see also Extended Data Fig. 2g). Data are mean + s.d.; *P < 0.05; 
** P< 0.01; ***P < 0.001; ****P < 0.0001 (Student’s t-test). 


and human (Extended Data Fig. 4b-e). To identify this activity, we 
analysed the microRNA (miRNA) profiles of supernatants with and 
without exosomes from TUBO cells and the HER2-overexpressing 
cell line MM3MG-Her2. miRNA sequencing (Supplementary Data 2) 
and bioinformatic prediction of Pgr regulators (Extended Data Fig. 4f) 
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Figure 3 | Cell density regulates PGR expression and early lesion 
phenotype. a, TUBO cells re-express PGR at low cell density. Scale bars, 
50m. b, TUBO cells grown at high density upregulate miR-9-5p. 

c, Expression of miR-9-5p in early lesion and primary tumour samples. 

d, Primary tumour and TUBO cells generate the early lesion signature 

only when grown at low density. e, Migration and sphere-formation of 

four human cell lines grown at low and high densities and treated with PIPS 
(WNT4 and RANKL) or progesterone (P) (see also Extended Data Fig. 5). 
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f, Number of lung macro-metastases (17 weeks after tumour resection) 
after tumour formation from transplanted tumour pieces (1 mm’; high 
density) or 50 spheres in 40 jl Matrigel (low density) and primary tumour 
surgery (shown are median and individual values). g, Mechanisms of local 
tumour and distant metastasis formation as derived from in vitro and 

in vivo (see Fig. 4) data. *P < 0.05; ****P < 0.0001(t-test and Stouffer's 
combined probability test (d); mean + s.d. (b-e); Mann-Whitney 

U-test (f)). 
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Figure 4 | Progesterone signalling regulates tumour formation and 
dissemination in vivo. a, Tumour formation 8 weeks after transplantation 
of spheres derived from primary tumours or early lesions into mammary 
fat pads of wild-type BALB/c siblings. b, Percentage of mice with 

DCCs (detected by cytokeratin staining) in bone marrow 8 weeks after 
transplantation. c, Number of DCCs in bone marrow of mice 8 weeks after 
transplantation. d, DCC counts in bone marrow in recipients transplanted 
at different ages. e, Time to tumour formation after transplantation of 
mammary glands (from 4-week-old mice; gland model) or tumour pieces 
(from 20-22-week-old mice; tumour model). The red box highlights 

mice from both models with similar tumour growth kinetics, which are 
analysed separately in g and Extended Data Fig. 60-q. f, Percentage of 
mice with lung macro-metastasis from gland or primary tumour models. 
g, Macro-metastasis formation in recipient mice with similar tumour 


identified miR-30a-5p and miR-9-5p as abundantly expressed in 
TUBO cells and able to downregulate Pgr in T47D cells (Extended Data 
Fig. 4g, h). miR-30a-5p expression was upregulated by Her2 (Extended 
Data Fig. 4f, i) and miR-9-5p expression was sensitive to cell density 
(Fig. 3b, c). 

Low cell density induced early-lesion-like features (such as induction 
of the early lesion signature, and reduced HER2 expression, migra- 
tion and sphere formation) in TUBO or primary-tumour-derived cells 
from the BALB-NeuT model (Fig. 3d and Extended Data Fig. 4j-m). 
Notably, several human breast cancer cell lines shared density regula- 
tion of HER2- and PGR-regulating miRNAs (Extended Data Fig. 5a-d). 
Moreover, we observed density regulation of migration and sphere 
formation in 10 out of 10 human breast cancer cell lines tested (7 out 
of 10 by PIPS and 3 out of 10 by progesterone induction; Fig. 3e and 
Extended Data Fig. 5e). 

To assess the contribution of cell density in vivo, we compared meta- 
stasis formation of transplanted primary tumour pieces (very high cell 
density) with primary tumourspheres (50 spheres injected in 40 il 
Matrigel, that is, very low density). After primary tumour formation, 
we performed curative surgery and evaluated metastasis formation. 
No difference was found in the percentage of mice with metastases 
(8 out of 19 for tumour pieces compared with 6 out of 16 for spheres), 
however, animals transplanted with primary tumourspheres had 
a higher number of metastases (P < 0.05; Fig. 3f). In summary, we 
obtained support for a model of metastatic dissemination regulated 
by cell density, HER2 expression and progesterone signalling (Fig. 3g). 


PGR signalling regulates dissemination in vivo 

To validate these findings, we analysed physiological conditions of 
reduced (higher age; Extended Data Figs 1f, 2a, 6a, b) and increased 
(pregnancy) PGR signalling. We transplanted spheres generated 
from early lesions and primary tumour samples into young (4 weeks 
of age; PGR-rich) and old (40 weeks of age; PGR-reduced) wild- 
type recipients. All mice were killed when the first mice developed 
tumours of 5-10 mm diameter (8 weeks later). Most mice transplanted 
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growth kinetics (mice from the red box in e; see Extended Data Fig. 60). 

h, Example of a phylogenetic tree (mouse 3769). Al-A3, inferred common 
ancestors; M1-M3, metastases 1-3; N, normal cell; P, primary tumour. The 
ordinate indicates the number of aberrations per profile on a square root 
scale. i, Aberration profiles along tree paths from N via Al-A3 to P or 
M1-M3 in terms of aberration prototypes (see Extended Data Figs 7-9). 

j, Distribution of relative time points of dissemination on a genetic 

scale for all 44 primary tumour-metastases pairs. The red line indicates 
dissemination after which 50% of primary tumour changes were acquired 
as an arbitrary threshold for early versus late dissemination; see Extended 
Data Fig. 7d. *P < 0.05; ***P < 0.001; ****P < 0.0001; NS, not significant 
(Student’s t-test (a, c, d); Fisher’s exact test (b); y7 test (f, g); Mann-Whitney 
U-test (e)). Lines in a, c, d and e denote the median. 


with primary tumours had palpable tumours, but none of the mice 
transplanted with early lesions (Fig. 4a), although these nevertheless 
harboured microscopic early lesions and/or ductal carcinoma in situ 
(DCIS) (Extended Data Fig. 6c, d). Notably, transplantation of spheres 
from early lesions increased the number of animals containing DCCs, 
and resulted in higher numbers of DCCs in bone marrow compared to 
transplantation of primary tumourspheres (Fig. 4b, c). 

There was suppressed dissemination and stimulated tumour for- 
mation from primary tumourspheres in the PGR-rich microenviron- 
ment of young mammary glands (Fig. 4d and Extended Data Fig. 6e) 
as expected. Dissemination from early lesions was not reduced in old 
recipients (Fig. 4d). However, the transplanted early lesions generated 
a PGR-rich microenvironment in old recipients (Extended Data 
Fig. 6d), consistent with the observation that cells from early lesions, 
but not from primary tumours, could generate PGR-positive cells in 
3D culture (Extended Data Fig. 6f, g). Co-transplantation of BALB- 
NeuT primary tumourspheres with MM3MG-Pg¢r-B into 40-week-old 
mice resulted in reduction of dissemination and stimulation of tumour 
formation (Extended Data Fig. 6h, i). 

Because progesterone levels are physiologically increased 10- to 
20-fold during pregnancy, we mated female transgenic mice at the age 
when early lesions (week 7) and early tumour formation (week 15) 
occurred. Mice were killed at term, and those mated at the early lesion 
age displayed higher numbers of DCCs (Extended Data Fig. 6j); 
whereas those mated at week 15 formed large tumours within 3 weeks, 
faster than unmated controls (Extended Data Fig. 6k). 

To assess metastasis from early lesions and primary tumour lesions, 
we transplanted pieces of mammary glands (gland model) of 4-5-week- 
old transgenic mice or from primary tumours (primary tumour model) 
into the cleared mammary fat pad of 4-week-old wild-type recipients 
(Extended Data Fig. 61, m). Tumour growth to 5-10 mm was fast in 
animals transplanted with primary tumour pieces (indicating their 
high viability) and took longer in animals in the gland model (Fig. 4e). 
After surgical removal of primary tumours (Extended Data Fig. 61, m), 
more mice displayed metastasis in the gland model (Fig. 4f; P< 0.0001), 
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Figure 5 | PGR and HER? signalling, and dissemination in patients 
with breast cancer. a, An increase in tumour diameter is not accompanied 
by an increase in DCCs in bone marrow. b, PGR and HER2 expression 
identifies a HER2"84/PGR8 subgroup of patients with the highest 
seeding rates. c, Comparison of HER2™8"/PGR8" human breast cancers 
and primary BALB-NeuT mouse tumours for HER2/PGR staining. Top, 
representative images of an early lesion (left) and primary tumour (right) 
from the BALB-NeuT model. Bottom, representative images of two 
patients. High-density regions (strong HER2 and low PGR expression, 
black arrows) and regions of invasive cells (strong PGR and HER2 
expression, blue arrows) are shown. Scale bars, 1 mm (top, bottom right) 
and 200 1m (bottom left). d, PGR-downregulating miRNAs are repressed 


although numbers of metastatic foci were similar in both cases 
(Extended Data Fig. 6n). Since the slower growth of early lesions until 
surgery might have extended the available time for metastasis forma- 
tion, assuming dissemination occurred early, we restricted our analysis 
to mice with similar kinetics of primary tumour growth (red box in 
Fig. 4e and Extended Data Fig. 60). Again, more mice developed lung 
metastases in the gland model (Fig. 4g), although in the analysis of this 
subgroup, follow-up time after tumour surgery was significantly longer 
for mice from the primary tumour model (Extended Data Fig. 6p). 
The number of metastases was similar (Extended Data Fig. 6q). 
Together, the in vivo results are in line with in vitro findings that PIPS 
induce migration and stemness of early lesion cells and proliferation 
of advanced primary tumour cells. 

We used the gland model to determine which metastases were 
derived from early rather than late DCCs and performed phylogenetic 
analyses of 28 primary tumours with 1 or more lung metastases in 
the same mouse. Phylogenetic trees were generated from copy- 
number alterations, since BALB-NeuT tumours rarely display point 
mutations, similar to human breast cancer”. Total numbers of 
copy-number alterations were indistinguishable between primary 
tumours and metastases (Extended Data Fig. 7a) and no individual 
change was significantly associated with primary tumour or metastasis 
origin (Extended Data Fig. 7b). This suggested that aberrations 
shared between primary tumours and metastases were acquired 
earlier, indicating the time of genetic divergence (Fig. 4h, i). In all 
cases, we observed branching evolution with none or one (single 
metastasis) or several (multiple metastases; Fig. 4i and Extended 
Data Figs 7-9) ancestors. To assign derivation of primary tumours 
and metastases as from early or late DCCs, we assessed the propor- 
tion of primary tumour alterations that were already present in the 
last common ancestor. In linear progression, this value would be 1 
(the primary tumour had acquired all mutations before the metastatic 
precursor left the site). Here, we used the very conservative threshold 
of >0.5 for late dissemination. Notably, we found that for 35 out of 
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in HER284/PGRs> human mammary carcinomas. e, Copy-number 
alterations in human primary breast cancers (from Progenetix database) 
and DCCs isolated from bone marrow of patients with breast cancer with 
and without metastasis (M0, n =94; M1, n=91). The y axis depicts the 
percentage of samples with aberrations (green, gain; red, loss) for each 
chromosomal region. f, Oestrogen receptor (ESR1) and progesterone 
receptor (PGR) transcript expression in human breast cancer DCCs 

(10 out of 26 DCCs from 19 M0 patients are shown; see Supplementary 
Table 8). ACTB, EEF1A1 and GAPDH denote controls for sample quality. 
BT474 single cells are a positive control. *P < 0.05; **P < 0.01; NS, not 
significant (7 test (a and b); Mann-Whitney U-test (d)). 


44 individual primary tumour-metastasis pairs (79.5%), lung 
metastases were derived from eDCCs that disseminated before the 
primary tumour had acquired 50% of its alterations (Fig. 4)). 


HER2 and PGR cooperation in human metastasis 

The mechanisms of breast cancer dissemination described above can- 
not be studied directly in patients, as the event occurs before diagnosis. 
We therefore checked whether human breast cancers could also seed 
relatively fewer bone-marrow DCCs (detected by cytokeratin staining”) 
with growing primary tumour size, as seen in the BALB-NeuT model 
(Extended Data Fig. 1a). Indeed, the percentage of DCC-positive 
patients or DCC numbers in bone marrow of 2,239 patients with 
breast cancer did not increase with tumour diameter (Fig. 5a). We 
then investigated whether DCC numbers were associated with HER2 
and PGR expression and categorized primary tumours according to 
their expression levels of PGR and HER2 (Supplementary Table 4). 
Notably, in breast cancers with a high PGR score, genetic activation 
of HER2 increased the dissemination rate (P < 0.05; Fig. 5b), akin to 
early lesions in the BALB-NeuT model. This subgroup of patients 
(HER2?™?/PGRD8b) comprised 3.7% of all patients (85 out of 2,239) 
or 24.6% of patients with HER2-amplified tumours (85 out of 345). 
HER2*”P/PGR8" (Supplementary Table 5) primary tumours con- 
tained HER2 and PGR single- and double-positive and double-negative 
cells (Extended Data Fig. 10a). Areas of high cell density lacked PGR 
expression and invasive regions of lower density contained strongly 
double-positive cells (Fig. 5c), suggesting density-mediated PGR regu- 
lation within the same samples. We therefore analysed the expression of 
PGR-regulatory miRNAs identified in the BALB-NeuT mouse model 
(Supplementary Table 6). HER2*"?/PGR"8 tumours displayed lower 
levels of PGR-downregulating miRNAs compared to HER2*”?/PGR"8 
tumours (Fig. 5d), similar to the human HER2b8h/pGRhis" cell lines 
BT474 and T47D (Extended Data Fig. 5a, b). PGR-negative, high-density 
regions from HER2*"?/PGR"8 samples (Fig. 5c) displayed strong 
PGR-regulating miRNA overexpression (Extended Data Fig. 10b). 
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DCCs from patients with breast cancer without metastasis (stage 
MO; n= 94 cells; Supplementary Table 7) and with metastasis (stage 
M1; n=91 cells) were compared to bulk primary breast cancers 
(n = 1,637) from a large database (Progenetix database; http://www. 
progenetix.net). Comparative genome hybridization profiles of pri- 
mary tumours were very similar to those of single DCCs isolated from 
M1-stage patients (Fig. 5e). By contrast, single DCCs from M0-stage 
patients, although these displayed clearly aberrant profiles, lacked 
the chromosomal gains and losses characteristic of primary tumours, 
such as 8p loss or 8q gain (Fig. 5e). Thus we can conclude that: (i) 
DCCs often disseminate before the acquisition of typical breast cancer 
copy-number alterations; (ii) tumour cells displaying the typical karyo- 
type of established tumours are rarely found at MO-stage of disease in 
the bone marrow; (iii) DCCs displaying M1-like genomes must replace 
DCCs with MO-like genomes to generate metastatic disease, even when 
the primary tumour was surgically removed. 

Finally, we tested whether human DCCs (n= 26) isolated from the 
bone marrow of 18 patients with luminal and 1 patient with triple- 
negative breast cancer (Supplementary Table 8) lacked PGR expression 
as predicted by the BALB-NeuT model (Extended Data Fig. 30). We 
identified and profiled DCCs by combined genome and transcriptome 
analyses of single cells**. None of the 26 DCCs displaying genomic 
aberrations expressed PGR (Fig. 5f). Notably, the only DCC expressing 
HER2 transcripts originated from one of two patients diagnosed with 
hormone receptor positive DCIS only. 


Discussion 

This work provides evidence that mouse and human mammary cancer 
cells migrate and disseminate from morphologically very early lesions. 
The mechanism, which is shut down as primary tumours grow and 
overt lesions develop, consists of three major components: cell density, 
PGR signalling and HER2 signalling. While the specific molecu- 
lar details are more likely to be tissue dependent than universal, our 
proposed mechanism may provide a general framework for the under- 
standing of metastasis with cancer cells undergoing a switch from a 
dissemination to proliferation mode (Fig. 3g). 

Our findings challenge the concept that late-disseminating (that is, 
shortly before surgery), fully mature cancer cells necessarily have a 
higher ability to form metastases’. Indeed, our genetic analyses showed 
that 80% of metastases in BALB-NeuT mice were derived from eDCCs. 
The genomic profiles of human DCCs isolated intraoperatively months 
to years before metastasis represented early cancer cells, not the pre- 
dominant primary tumour clones, indicating that eDCCs have yet to 
acquire critical alterations such as gains on chromosome 8q to form 
metastases. The time to acquire such changes may largely account for 
the long latency periods and late relapses, which are becoming clinically 
more and more relevant”®. Therefore, parallel progression’ seems to be 
typical rather than exceptional. This is supported by a recent sequencing 
study, in which not a single case of linear progression from primary 
tumour to metastasis was found”. 

Our data indicate that breast cancer hijacks a developmental pro- 
gram, in which progesterone and its paracrine signals regulate mam- 
mary epithelial branching, fat pad invasion’*’””8 and mammary 
stem-cell expansion!*!7-'89-3! during development and pregnancy, 
for dissemination, independently of breast cancer subtype. Relevance 
to human disease is highlighted by a careful analysis of mortality from 
DCIS, hitherto defined as pre-invasive lesion*”. Of the 3% of DCIS 
fatalities, more than 50% die of metastasis without local relapse, indi- 
cating lethal dissemination before surgery of the pre-invasive lesion*”. 
Evidently, dissemination will also occur early if the tumour is not 
diagnosed as DCIS but later as invasive cancer. Moreover, death from 
DCIS increased significantly to 8% for young women”, possibly 
because in young women the epithelial compartment and the micro- 
environment have high PGR expression. 

The gradual generation of a HER2""/PGR™ phenotype may explain 
why early lesions of the BALB-NeuT model, but not advanced tumours, 
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were found to represent human luminal tumours*’. The BALB-NeuT 
model apparently represents a mixed luminal and HER2 phenotype and 
models several breast cancer subtypes. Cell density, progesterone, PIPS 
and HER2 signalling regulated dissemination and sphere formation in 
breast cancer cell lines of all subtypes similar to the BALB-NeuT model. 
Notably, HER2-positive circulating tumour cells were detected in DCIS 
and lobular carcinoma in situ or patients with MO-stage breast cancer 
irrespective of the HER2 status of the primary tumour**. 

High cell density and HER2 expression were responsible for the 
proliferative switch of mammary epithelial cells. Oncogenic muta- 
tions characterize benign tumours that do not metastasize**’; indeed, 
strong activation of oncogenic pathways represses metastasis while 
increasing proliferation*®”’. However, our experiments do not exclude 
the possibility that metastases form from advanced tumours because 
the early lesion dissemination program may become re-activated in 
areas of low cell density. 

Our findings have implications for the understanding of metastasis 
and development of therapies: systemically spread cancer cells probably 
comprise cells derived from different stages of primary tumour evolu- 
tion, including the earliest. Since DCCs from early and later stages have 
metastatic potential, therapies targeting the seed of metastasis need to 
address this heterogeneity. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Mice. BALB-NeuT transgenic mice were obtained through collaboration with 
G. Forni and maintained in our facilities according to the European Union guide- 
lines. All animal experiments were performed according to the EU and national 
institutional regulations. Mice were screened at 3-4 weeks of age for hemizygo- 
sity (neuTt/neuT~ ), and negative littermates served as wild-type BALB/c mice 
controls. Mammary glands of BALB-NeuT female mice were inspected twice a 
week and arising tumours were measured in two perpendicular diameters. Data 
acquisition for bone-marrow DCCs was performed in a blinded manner, whereas 
enumeration of lung metastasis was performed unblinded by two observers. 
All experimental animal procedures were approved and conducted according 
to German federal and state regulations (Government of Upper Palatinate, 
55.2-2532.1-27/14). 

Surgery and transplantation experiments. Mice were anaesthetized with mida- 
zolam 5mg kg~!, fentanyl 0.05 mg kg~!, medetomidin 0.5 mg kg™! by intraperito- 
neal injection. The thorax and abdomen were shaved; the skin was incised caudal to 
cranial along the midline. Fifty spheres were mixed with Matrigel (BD Biosciences: 
356231, final concentration, 40%) and injected into the fourth right mammary 
gland of BALB/c mice (4 and 40-week-old mice). For tissue transplantation, a piece 
(approximately 1 mm?) of donor mammary tissue from 4-week-old BALB-NeuT 
mice (gland model) or primary tumours (primary tumour model) was implanted 
in the cleared mammary fat pad of recipient mice (4-week-old BALB/c mice). The 
skin was closed by a suture using polygelatin string (Ethicon) and anaesthesia 
was antagonized with flumazenil 0.5 mg kg~, atipamezol 2.5 mg kg~!, naloxon 
1.2 mg kg~! by subcutaneous injection. Postoperative analgesia was achieved by 
buprenorphin (0.1 mg kg~') by subcutaneous injection. Curative surgery or dissec- 
tion was done when the diameter of tumours was between 5-10 mm. After surgery, 
mice were kept until we observed the first general signs of reduced health. After dis- 
section, lungs were macroscopically inspected and individual metastases counted. 
Extent of dissemination relative to tumour area. Data were taken from ref. 8. 
Briefly, gland or tumour areas were calculated from 270 mammary glands or 
tumours of 27 mice assuming the shape of an ellipse or circle for each tumour. 
The tumour area of mammary glands without palpable (that is, not measurable 
by a caliper) tumours was set to 0.1 mm? (that is, assuming a diameter of 350 1m 
of a total, circular hyperplastic lesion within a mammary gland) for lesions 
from 4—9-week-old mice and 0.4mm?’ for 11-week-old mice. The adjustment 
for 11-week-old mice was based on a microscopic evaluation showing an about 
4-fold increase in hyperplastic lesions. Dissemination to the bone marrow was 
determined by the number of cytokeratin-positive cells per 10° bone-marrow cells 
(Extended Data Fig. 1a). 

Mouse bone marrow preparation and staining for DCCs. Bone marrow was 
collected from femurs and tibiae. The bone marrow was rinsed with a 26-G needle 
with 1 ml of PBS. After density gradient centrifugation, 5 x 10° interphase cells 
were put on adhesion slides (Menzel). At least 10° cells per mouse were stained to 
detect positive cells. Blocking solution (5% rabbit serum in 1 x TBS (50mM Tris- 
base, 150 mM Nacl, pH 7.4)) was added to the slides to rehydrate the cells and to 
block unspecific binding of antibodies to the cells. After 20 min the blocking solu- 
tion was discarded and primary antibody against cytokeratins 8 and 18 (CK8/18; 
all antibodies and working concentrations are in Supplementary Table 9) or guinea 
pig serum (the CK antibody originated from guinea pig) as control, was added 
and slides were incubated for 60 min. The primary antibody was discarded and 
the slides were washed 3 x for 3 min in 1 x TBS. The slides were incubated with 
the secondary antibody for 25 min, and then washed 3x for 3min in 1x TBS 
followed by incubation with the ABC complex (Vector Laboratory) for 25 min. Finally, 
the development system of the BCIP/NBT (AP Conjugate Substrate Kit, Bio-Rad 
Laboratories GmbH, 1706432; Levamisol hydrochloride, Sigma-Aldrich GmbH, 
L-9756) for alkaline phosphatase enzymatic substrate was added for 10 min. The 
slides were washed 3 x for 3 min and screened for CK8/18-positive cells. The posi- 
tive cells were typically violet-to-black in colour. TUBO, a tumour cell line derived 
from a mouse primary mammary tumour of BALB-NeuT and known to express 
CK8/18, was used as a positive control. 

Laser microdissection and microarray analysis. Laser microdissection (PALM 
MicroBeam from Carl Zeiss Microlmaging GmbH) was performed to dissect 
metastatic lesions from lungs, primary tumours and epithelial layers of mammary 
glands of BALB-NeuT mice at the time point of early lesions (7-9-week-old mice; 
examples are shown in Extended Data Fig. 1b), and BALB/c mice at different ages 
(description of samples is given in Supplementary Table 1). Small pieces adding 
up to 100,000 1m? for each sample were catapulted into a cap with 101] para- 
magnetic, biotinylated, oligo-dT-peptide, nucleic-acid, bead suspension and lysis 
buffer (Active Motif, 29011). Extraction of mRNA and microarray experiments 
were performed as described previously’. Heatmaps in Fig. 1a were generated 
using Euclidean distance and complete linkage agglomerative clustering on row 
(gene)-wise standardized expression data (zero mean, unit standard deviation). 
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Cell lines, cell culture and cell stimulation. Breast cancer cell lines (4T1, 66cl4, 
and 67NR) were provided by F. Miller. These cell lines were derived from a single 
mammary tumour that arose spontaneously in a wild-type BALB/cfC3H mouse. 
The MM3MG mouse mammary epithelial cell line derived from a BALB/c back- 
ground was purchased from ATCC (ATCC CRL6376). The TUBO cell line is a 
cloned cell line established in vitro from a lobular carcinoma that arose spon- 
taneously in a BALB-NeuT mouse (gift from G. Forni). All mouse and stably 
transduced cell lines were grown in DMEM medium (Pan-Biotech, P04-03500) 
supplemented with 10% (20% for TUBO cell line) FCS (Pan-Biotech: P30-3702), 
2mM 1-glutamine (Pan-Biotech, P04-80100), 10U ml7’ penicillin/streptomy- 
cin (Pan-Biotech, P06-07050). All human cell lines were purchased from ATCC 
and each cell line was maintained in medium recommended by ATCC. The 
origin of the cell lines was confirmed by short tandem repeat (STR) analysis 
(Cell-ID, Promega). All cells were incubated at 37°C with 5% COb. Steroid 
hormones (progesterone, aldosterone, 3-oestradiol, testosterone and hydrocor- 
tisone; all from Sigma-Aldrich) and RU486 (Sigma-Aldrich) were dissolved in 
ethanol. RANKL (mouse Rankl, Abcam, ab151200; human RANKL, Abcam, 
ab9958); WNT4 (mouse WNT4, R&D systems, 475-WN; human WNT4, Abnova, 
H00054361-P01); lapatinib (Santa Cruz Biotechnology, SC202205); [WP-2 (Sigma- 
Aldrich, 10536); RANKL-neutralizing antibody (Lifespan Biotech, LS-C150261) 
were dissolved according to the manufacturer’s instructions. All cell lines were 
routinely tested for mycoplasma and were found to be negative. 

Primary cultures and sphere formation assay. Fresh mammary glands or pri- 
mary tumours were digested with 200 units per ml collagenase I (Worthington 
Biotech, LS004196) and 1 pg ml! hyaluronidase (Sigma-Aldrich, 4272) in basal 
medium for 2h at 37°C. The basal medium consisted of DMEM/F12 (PAN biotech, 
P04-41450) supplemented with 10 mM HEPES buffer (Sigma-Aldrich, H0887), 
penicillin/streptomycin (Pan Biotech, P1-010) and 10,.g ml“! insulin (Sigma- 
Aldrich, 19278). Digested tissue cells were centrifuged and re-suspended in basal 
medium. The cells were subsequently cultured at a density of 5 x 10* cells per ml 
in ultra-low adherent plates coated with 1.2% poly-HEMA (Sigma-Aldrich, 
P3932) or at a density of 2.6 x 10‘ cells per cm? for adherent culture in DMEM 
medium (Pan-Biotech, P04-03500) supplemented with 10% FCS (Pan-Biotech, 
P30-3702), 2mM t-glutamine (Pan-Biotech, P04-80100), 10U ml! penicillin/ 
streptomycin (Pan-Biotech, P06-07050). Sphere culture medium was basal medium 
supplemented with 2% B27 (Gibco, 17504044), 10j1g ml—! EGE (Sigma-Aldrich, 
E9644), 10ng ml—! bFGF (Sigma-Aldrich, F0291), 20ng ml~! hIL6 (gift from 
S. Rose-John), 4ng ml~ heparin (Sigma-Aldrich, H3149), 5ng ml~! GRO-o (R&D 
systems, 275-GR). Concentrations of activators, inhibitors and other molecules are 
given in the main text, figures or legends. Sphere cultures were incubated at 37 °C 
with 5% CO, and 7% O, and cultures were screened for spheres after 10 days. Only 
spheres with a diameter over 501m were counted. The size of mammospheres was 
inspected under a light microscope and measured using Zeiss Axiovision software 
(Carl Zeiss) after 10 days. 

Cell density experiments. TUBO cells were cultured at 3 x 10* cells per cm? for 
high density and 5.2 x 10° cells per cm? for low-density experiments. Primary 
cells derived from primary tumours were cultured in 10.6 x 104 cells per cm? for 
high-density and 2.2 x 10° cells per cm? for low-density experiments. Density 
criteria for human cell lines were 100% confluency for high density and 20-30% 
confluency for low-density. For hormone treatment and comparisons between 
low and high-density experiments, cells were incubated for 76h with fresh 
hormone treatment and washes (2 x with PBS) at 24-h intervals. We avoided chang- 
ing medium and washing during incubation of cells for miRNA analyses. In migra- 
tion experiments we seeded 10° cells per well (24-well migration chambers) for 
low-density experiments for all cell lines and 5 x 104 cells per well for high-density 
experiments with TUBO cells and 4 x 10° for the other cell lines. 

Transwell assay. Transwell inserts (Corning, 3419) with a microporous membrane 
of 0.4\1m were used to separate the upper and lower compartments. The micro- 
porous membrane allows only soluble factors to pass between the compartments. 
Early lesion cells were cultured in the lower chamber and primary tumour cells 
were cultured in the upper chamber. Both were cultured at a density of 10° cells 
per well of 6-well plates (DMEM with 10% FCS). 

Migration/sphere-formation assays. Transwell inserts (Corning: 3422) with 8-jzm 
pores were coated with 30% matrigel. 4 x 10* from cell lines and 10° cells isolated 
from tissue or dissociated spheres were resuspended in FCS-free medium (DMEM) 
before seeding. Cells were then seeded in 200 11 of FCS-free medium on top of the 
Matrigel layer and FCS medium (DMEM containing FCS) was added to the lower 
chamber. For additional treatments medium in both upper and lower compart- 
ments was supplemented with the reagents at concentrations specified in the text 
and figures. After incubating cells isolated from mammospheres or from freshly 
digested tissues for 72 h, inserts were removed and cells were fixed with methanol 
(—20°C for 10 min) and stained with trypan blue. Cells were counted from 3 fields 
(4x magnification) when visualized under the microscope. 
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For the combined migration/sphere formation assay, cells were placed on a layer 
of 30% Matrigel in the upper chamber and the lower chamber was coated with 
poly-HEMA. The mammosphere medium used is described above. After 72h, 
inserts were removed, fixed and stained with trypan blue for single cell migration 
analysis. 60011 fresh sphere medium was added to the lower chamber and cells 
were incubated for 11 days when spheres were counted (Fig. 2e and Extended 
Data Fig. 2g). 

Proliferation assay. Single-cell suspensions were cultured in 96-well plates 
(Corning Inc) and proliferation was evaluated by a XTT-colorimetric assay kit 
(Roche, 11465015001) based on the manufacturer’s instructions. Seeding con- 
centration of cells was 3,000 cells per well. The experiment was performed with 
6 technical replicates. The medium was supplemented with the tested factors or 
hormones and vehicle (see corresponding experiments) and was changed every 
second day. 

Immunohistochemistry. For PGR and HER2 immunohistochemistry of tissue 
sections, we used 5-\1m sections of paraffin blocks placed onto poly-.-lysine-coated 
slides. Samples were dewaxed by two 5-min washes in xylene and rehydrated with 
graded alcohol by 5-min washes and a final wash in water. A standard Tris-EDTA 
buffer and pressure-cooking was used for antigen retrieval and then sections were 
blocked in 0.3% HO, in TBS and 10% normal goat serum. Sections were incubated 
for 1h with primary antibodies and, after washing, secondary antibodies (Vector 
laboratory, PK4001 or PK5000) were added based on the manufacturer’s recom- 
mended dilution (see Supplementary Table 9). After washing with PBS, sections 
were stained using the ABC detection system (Vector Laboratory) according to the 
manufacturer's instruction. Visualization was performed with chromogen reagent 
(Dako, 10046560) according to manufacturer’ instructions. 
Immunofluorescence staining. For staining of cells from monolayer cell cul- 
tures, cells were seeded onto 24-well culture plates at an appropriate density. 
After 72h of incubation, cells were washed with PBS and fixed with 4% PFA 
for 10 min. Then, cells were permeabilized with 0.2% Triton X-100 followed by 
washing steps and blocking with 1% BSA in PBS at 37°C and incubated with 
primary antibody (see Supplementary Table 9) for 1 h at room temperature. Cells 
were then washed 3x with PBS and incubated with labelled secondary antibody 
(Jackson ImmunoResearch Laboratory Inc) for 1h at room temperature. For 
nuclear counterstaining, cells were incubated for 10 min with 0.5 j1g ml~’ DAPI 
(Sigma-Aldrich). For the staining of spheres in differentiation experiments, mam- 
mospheres were picked and transferred to a 24-well cell-culture plate and incubated 
for 8h in sphere medium in order to fix them to the surface. The subsequent 
staining protocol was as for monolayer cell culture staining. For staining of cells 
attached to the inserts from migration experiments, inserts were used directly after 
migration (see migration assay), for the blocking step and immunofluorescence 
staining, the monolayer cell culture staining procedure was applied. Images were 
captured on an AxioVert 200M microscopy (Carl Zeiss Microscopy). 
Quantification of HER2 and PGR staining in tissue sections by TissueFAX 
cytometry. Tissue sections were stained with an automated staining machine 
(Ventana, BenchMark ULTRA). Tissue sections used for analysis were stained 
within the same run. Images of stained tissue sections were scanned with the 
TissueFA XSi-plus imaging system (TissueGnostics, Vienna, Austria; acquisition 
software: TissueFAXS v3.5.129) equipped with a digital Pixelink colour camera 
(PCO AG). Images for the analysis of HER2 and PGR staining were analysed 
with HistoQuest software v3.5.3.0185 (TissueGnostics). Using the HistoQuest 
software, two markers were created: haematoxylin as master marker (nucleus) 
and HER2 or PGR as non-master marker. To achieve optimal cell detection, the 
following parameters were adjusted: (i) nuclei size; (ii) discrimination by area; (iii) 
discrimination by grey and (iv) background threshold. For the evaluation of the 
HER? staining intensity of cells or the percentage of PGR-expressing cells, histo- 
grams were created, allowing the visualization of corresponding cells in the source 
region of interest using the real-time back-gating feature. The cut-off discriminated 
between false events and specific signals according to cell size and intensity of 
staining. For HER2 staining, 38,675 primary tumour cells (6 regions, 1.99 mm’), 
28,850 cells from hyperplastic regions (25 regions, 1.55 mm7?) and 14,938 cells from 
non-transformed ducts (30 regions, 0.93 mm”) were analysed. For PGR, 12,269 
cells of early lesions (hyperplasia, 7 weeks, 11 regions, 0.5 mm), 12,702 cells of 
non-transformed (normal duct, 7 weeks, 56 regions, 0.7 mm”) and 25,357 primary 
tumour cells (9 regions, 1.3 mm7’) were analysed (Fig. 1d). 

Quantitative PCR. All mRNA extractions were performed using the RNeasy kit 
(Qiagen, 74104) according to the manufacturer’s instructions. For miRNA extrac- 
tion, the miScript II RT Kit (Qiagen, 217004) was used. cDNA was generated 
using a reverse transcriptase kit (Qiagen, 205311 for total RNA and 218161 for 
miRNA). Finally, 25 ng of cDNA was used for qPCR. qPCR was performed using 
a LightCycler instrument (Roche) and Fast Start Master SYBR Green Kits (Roche). 
Data analysis was done using the RelQuant software (Roche) with a reference 
gene and a calibrator (reference) sample in every run. Mouse reference cDNA 


served as a positive control. Samples with unspecific products in the melting 
curve analysis were discarded from further analysis. Expression levels are given 
relative to Actb (8-actin) for gene expression analyses and Rnu6 for miRNA anal- 
yses (primer sequences are provided in Supplementary Table 10). All primers for 
mRNA analyses were synthesized by Eurofins MWG Operon, and by Qiagen for 
miRNA analyses. 

For comparison of miRNA levels in high-density and low-density regions 
(Extended Data Fig. 10b) of formalin-fixed paraffin-embedded samples, regions 
were punched out using a 1.5-mm puncher (PFM medical; 48115). Samples were 
incubated for 10 min at 70°C followed by xylene-ethanol de-paraffinization 
and overnight proteinase K (0.5 1g l~!, Roche 03115828001) digestion. Then 
miRNA extraction was performed using the miRNeasy kit. For comparison of 
miR-30a-5p and miR-9-5p between HER"8"/PGR"8" human mammary carci- 
nomas and HER2*igh/PGR~ carcinomas, miRNAs were extracted from freshly 
frozen samples using the miRNeasy kit. Expression of miR-9-5p and miR-30a-5p 
was normalized to HER2~/PGR"®" breast cancers (see Supplementary Table 6 
for details on patients). 

Lenti- and gamma retroviral transduction. PGR expression was carried out with a 
lentiviral construct encoding human PGR-B (GeneCopeia, Z5911). Lentiviral 
packaging was conducted as previously described*". Helper vectors were pSPAX2 
and pMD2.G (Addgene). Selection was performed using 10,.g ml~! of puromycin 
(Sigma-Aldrich, P8833). For Her2 expression p»XSN-NNeu (rat wild-type 
Neu/Her2) was used (obtained from L. Petti)‘”. Retroviral delivery of transgenes 
was performed as described previously*’. Helper vectors were pCMV-VSV-G and 
pUMVC3 obtained from Addgene. Selection was performed using 1,000,1g ml“! of 
G418 (Sigma-Aldrich, G9516 ). MM3MG cells were transduced with lentiviruses 
and/or retroviral vectors and cell colonies were selected using antibiotics. Positively 
transduced clones were expanded and screened for PGR and/or HER2 levels by 
western blot analysis and qPCR. 

Western blot analysis. Cell lysates were prepared using RIPA buffer (Sigma- 
Aldrich, R0278) and were analysed with the BCA protein assay kit (Thermo 
Scientific, 23227) to measure and their protein concentration was equalized. 
Quantified protein lysates were resolved on 6.5% SDS-PAGE gels, and transferred 
onto a polyvinylidene difluoride membrane (Millipore), and immunoblotted with 
the primary antibodies overnight followed by incubation with the horseradish 
peroxidase-conjugated secondary antibodies. The blots were visualized using a 
substrate kit (GE Healthcare, RPN2109) and bands were visualized by Imagequant 
LAS 4000 (GE Healthcare). The full blot images are shown in Supplementary Fig. 1. 
Exosome isolation, miRNA analyses and sequencing. To prepare conditioned 
medium, TUBO cells were seeded at a density of 3 x 10‘ cells per cm’. After 4 
days, medium was collected, centrifuged and filtered and used as conditioned 
medium. For exosome isolation we used an ultracentrifugation method as previ- 
ously described. Exosome pellets were resuspended in fresh medium and used 
for T47D cell line treatment. PGR expression was checked at different time points 
(4, 8, 24, and 48h). For miRNA sequencing we used 4 x 10° cells and exosomes 
were isolated from confluent medium from TUBO cells. The miRNA cloning and 
sequencing was done as described previously*®. All pooled samples were sequenced 
on a MiSeq system (Illumina) in a single-end run with 80 cycles using the MiSeq 
reagent kit v3. Data analysis was performed using in-house written scripts. 
Sequences were mapped—without any mismatches allowed—against mouse miR- 
NAs listed in the miRBase v20 (June 2013; http://www.mirbase.org). The minimum 
length of reads was set to 18 nucleotides. Annotated miRNA-reads were normalized 
as RPM values according to the total number of mapped reads in the respective 
library. Mimic miRNAs were ordered from Eurofins MWG Operon Company and 
all sequences are listed in Supplementary Table 11. For miRNA transfection we 
used reverse transfection protocols according to instructions for RNAiMAX (Life 
Technology, 13778030) and with a 50 nM concentration of miRNA. 

Mouse genome comparative genome hybridization microarrays. DNA samples 
were extracted from freshly frozen samples using the DNeasy Blood & Tissue Kit 
(Qiagen, 69504). Genomic DNA labelling was done using the Agilent SureTag 
DNA Labelling Kit (Agilent, 5190-3400). Array CGH was performed on oligo- 
nucleotide-based SurePrint G3 Mouse CGH Microarray Kit, 4x180K (design 
code: 027411) according to the protocol provided by the manufacturer (Agilent 
Oligonucleotide Array-Based CGH for Genomic DNA Analysis, v.7.2, July 2012). 
Phylogeny of primary tumours and metastases. Ancestral relations among matched 
samples of primary tumour and metastases were inferred using array CGH pro- 
files. The array comparative genomic hybridization dataset consisted of 28 primary 
tumour samples with 1-3 corresponding metastasis samples (18 primary tumours 
with 1 metastasis; 4 with 2 metastases and 6 with 3 metastases). 

Positions of the probes on the array were mapped to the current mouse refer- 
ence genome (mm10) using the liftOver tool**. No background correction was 
applied to the data‘”. The data were first normalized within arrays using Loess“. 
Then, log ratios were corrected for spatial artefacts using a median filter with an 
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11 x 11 block of probes” and a between array scale normalization was applied”. 
Duplicate probes (having the same genomic position) were summarized by their 
median log ratio. The R package limma®! (v.3.28.1) was used for normalization 
within arrays and between arrays (with default parameters). In a final step, wavy 
patterns were removed from the data using an approach similar to ref. 52, but with 
modifications to account for broad copy-number alterations. For every sample and 
chromosome, correction was carried out as follows. Because the maximum number 
of broad copy-number alterations on any chromosome observed in the data was 
two, a piecewise constant function with two pieces was fitted to the log ratios to 
estimate these broad alterations. Each piece was required to be longer than 5% 
of the chromosome length to avoid spurious small pieces. The wavy pattern was 
estimated by fitting a Loess curve with a window size of 100 probes to the residuals 
of the piecewise constant function fit. To avoid smoothing true focal alterations, 
the weight of probes with an absolute log ratio deviation greater than 0.5 from the 
piecewise constant function were set to 0 for the Loess fit. The estimated wavy 
pattern was then subtracted from the log ratios resulting in corrected values. 

After normalization the log ratios were segmented using Circular Binary 
Segmentation as implemented in the R package DNAcopy” (v.1.46.0). The default 
parameters were used except for «, which was set to 0.001. Segments with a length 
of 5 or less probes were merged with the closest adjacent segment. For every sam- 
ple, states with means closer than 0.05 were merged iteratively beginning with the 
two states with closest means. When two states were merged the new mean was 
given by the mean of the two old state means weighted by the number of probes 
in every state. After state merging, the remaining segment means were adjusted 
to have a median of 0. These segmented copy number profiles were then decon- 
structed into underlying copy number events using Ziggurat Deconstruction™. 
All these steps where performed using R (v.3.3.0). 

Aberration events as defined by left and right change points, and aberration type 
were pooled across the matched samples of a single mouse to form a mouse-specific 
base set of aberration prototypes. For this, amplifications and deletions that were 
similar by more than 80% as measured by their Jaccard-index regarding probe 
support were merged into single prototypes using Jaccard-distance-based com- 
plete linkage clustering and union of supports. Individual primary tumour and 
metastases samples were then encoded according to presence (absence) of the 
prototypic aberrations, whereby the present prototype was called by the minimum 
Jaccard-distance. The resulting feature vectors were then used for phylogenetic 
tree inference. Phylogenetic trees were generated by assuming ideal (that is, error- 
free) data and inferring plausible common ancestors (intermediates) of aberration 
profiles by extracting shared features of an increasing number of samples, that 
is, evaluating common aberration events in sample pairs, triplets, quads, and so 
on, and organizing these ancestors according to hierarchical levels. Subsequently, 
admissible edges were constructed top-down between vertices allowing for two 
re-losses of acquired gains and no re-gains of any losses (this condition was also 
ensured globally for each path). Then all simple paths from the normal cell to the 
samples were generated using the Igraph R-package (v.1.0.0), combined into a 
directed acyclic graph and filtered for the fewest genomic changes along the graph 
and lowest number of intermediates (maximum parsimony). This resulted in one 
unique phylogenetic tree for each mouse. 

Detection of human DCCs. For CGH analysis of single DCCs, bone-marrow 
sampling of patients with M1 stage cancer was performed within the study pro- 
tocol of the GEBDIS study at the Central Hospital in Augsburg after informed, 
written consent of patients was obtained. The ethics committees of the University of 
Munich (ethics vote number 007/02) and Regensburg (ethics vote number 07-079) 
approved bone marrow sampling (including patients with MO-stage cancer) and 
genomic analysis of isolated cells. For all patients informed, written consent was 
obtained. For bone-marrow sampling and analysis for cytokeratin-positive cells of 
patients from Tibingen (approval by the ethics committee of Tiibingen University, 
reference number 560/2012R) all specimens were obtained after written, informed 
consent. Bone-marrow sample preparation, slide preparation, cytokeratin staining 
and cell isolation was performed as previously described?. 

Whole genome amplification and single cell comparative genomic hybridization. 
Whole genome amplification was performed as previously described®°>. The 
method has become commercially available as a kit (Ampli1, Silicon Biosystems). 
A histogram of copy-number alterations (Fig. 5e) was generated for human pri- 
mary breast cancers (n= 1,637) derived from the Progenetix database (http://www. 
progenetix.net) and DCCs isolated from bone marrow of breast cancer patients 
without (M0, n= 94; see Supplementary Table 7 for clinical details of patients) and 
with metastasis (M1, n=91). 

Analysis of patients with breast cancer for association of DCC and primary 
tumour receptor expression. We analysed data of 2,239 patients from the 
Department of Oncology and Obstetrics, University of Tiibingen. DCC status was 
assessed according to the consensus protocol”, using the anti-cytokeratin antibody 
A45B/B3 and by evaluating 2 x 10° bone-marrow cells. PGR expression of primary 
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tumours was categorized into PGR staining scores 0-1 for absent expression; 
2-8 for intermediate expression and 9-12 for high expression. HER2 status of 
primary tumours was categorized into the staining score 0 for absence of HER2 
staining (IHC negative); score 1 and score 2 without HER2 amplification (IHC 
positive) and score 2 with HER2 amplification and 3 (which is known to be caused 
by HER2 amplification; FISH positive). 

Bioinformatics and statistical analysis. Statistical analyses and estimation of 
variation within each group of data were performed using GraphPad Prism v.6 and 
Rv.3.3.1. For in vivo DCC experiments of primary tumour compared to early lesion 
spheres, sample size was estimated using G* Power (v3). No statistical methods 
were used to predetermine sample size for other experiments. For each experiment, 
mouse numbers are given in the figures or the text. All in vitro and primary culture 
experiments were performed at least in triplicate and Student's t-test was used 
for comparisons. For all other experiments we applied the D’Agostino-Pearson 
omnibus normality test. When sample size was sufficiently large (n > 8) and were 
not distributed normally according to the D’Agostino-Pearson test (P< 0.05) we 
applied the Mann-Whitney U-test. For gene signature evaluation in Figs 1c, 3d 
and Extended Data Fig. le, g, gene wise t-test P values were combined using 
Stouffer’s method. A linear regression test (F-test for slopes) was used to compare 
proliferation curves and tumour growth. For comparing numbers between different 
groups we applied Fisher’s exact test or if the sample numbers were at least 
5 in each condition the y? test. In Fig. 1d one-way ANOVA was used. All 
P values are two-tailed. All P values (0.05 < P< 0.0001) and statistical tests 
are mentioned in either figures or legends. Genomatix (v2.0) (https://www. 
genomatix.de) was used for signalling pathway analysis and oPOSSUM 
(v1) (http://opossum.cisreg.ca/oPOSSUM3/) for transcription-factor bind- 
ing-site enrichment. For miRNA-binding enrichment, we used DIANALAB 
(http://diana.cslab.ece.ntua.gr/) and for the identification of target miRNAs for 
single target genes the miRANDA software (http://www.microrna.org/microrna/ 
home.do) was used. The experiments were not randomized. 

Data availability. The miRNA sequencing data and microarray data are depo- 
sited at the Gene Expression Omnibus (GEO) database under accession number 
GSE68683. Analysed data for microarray and miRNAs can be found in 
Supplementary Data 1 and 2, respectively. The mouse ancestral CGH data are 
deposited at the GEO database under accession number GSE87469. All raw data for 
presented graphs and statistics are deposited in Source Data files. Further material 
and data other than what is presented here can be obtained from the corresponding 
author (C.A.K.) upon request. 


40. VerMilyea, M. D. et al. Transcriptome asymmetry within mouse zygotes but not 
between early embryonic sister blastomeres. EMBO J. 30, 1841-1851 (2011). 

Al. Dull, T. et al. A third-generation lentivirus vector with a conditional packaging 
system. J. Virol. 72, 8463-8471 (1998). 

A2. Petti, L. M. & Ray, F. A. Transformation of mortal human fibroblasts and 
activation of a growth inhibitory pathway by the bovine papillomavirus E5 
oncoprotein. Cell Growth Differ. 11, 395-408 (2000). 

43. Stewart, S. A. et al. Lentivirus-delivered stable gene silencing by RNAi in 
primary cells. RNA 9, 493-501 (2003). 

44. Thery, C., Amigorena, S., Raposo, G. & Clayton, A. Isolation and characterization 
of exosomes from cell culture supernatants and biological fluids. Current 
Protoc. Cell Biology Chapter 3, Unit 3.22 (2006). 

45. Dueck, A., Eichner, A., Sixt, M. & Meister, G. A miR-155-dependent microRNA 
hierarchy in dendritic cell maturation and macrophage activation. FEBS Lett. 
588, 632-640 (2014). 

46. Meyer, L. R. et a/. The UCSC genome browser database: extensions and 
updates 2013. Nucleic Acids Res. 41, D64—D69 (2013). 

47. Yang, Y. H., Buckley, M. J. & Speed, T. P. Analysis of cDNA microarray images. 
Brief. Bioinform. 2, 341-349 (2001). 

48. Yang, Y. H. et al. Normalization for cDNA microarray data: a robust composite 
method addressing single and multiple slide systematic variation. Nucleic 
Acids Res. 30, e15 (2002). 

49. Khojasteh, M., Lam, W. L., Ward, R. K. & MacAulay, C. A stepwise framework for 
the normalization of array CGH data. BMC Bioinformatics 6, 274 (2005). 

50. Smyth, G. K. & Speed, T. Normalization of cDNA microarray data. Methods 31, 
265-273 (2003). 

51. Ritchie, M. E. et a/. limma powers differential expression analyses for 

RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). 

52. Marioni, J. C. et al. Breaking the waves: improved detection of copy number 

variation from microarray-based comparative genomic hybridization. Genome 

Biol. 8, R228 (2007). 

53. Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation 

algorithm for the analysis of array CGH data. Bioinformatics 23, 657-663 

(2007). 

54. Mermel, C. H. et a/. GISTIC2.0 facilitates sensitive and confident localization of 

the targets of focal somatic copy-number alteration in human cancers. 

Genome Biol. 12, R41 (2011). 

55. Klein, C. A. et al. Comparative genomic hybridization, loss of heterozygosity, 
and DNA sequence analysis of single cells. Proc. Nat! Acad. Sci. USA 96, 
4494-4499 (1999). 


ature. All rights reserved. 


ARTICLE 


a b 
@ 100 | _ 
5 Before laser dissection After laser dissection Before laser dissection After laser dissection 
7 80 5 2 es ? ESS A: > 
£ 60 
BE 
gE 40 
2 
= 20 
Oo 
o Oo 
fal Week 
4 79 11 18 22 28 
Number of mice u 11s ) 1 1 
per data point 3 Sea 4 5 4 
c 8 6 
100 
3 Ahnak 7° Rbp1 Baz2a $0 comp °°) ztp4o8 
oe 40 1 = : = 
SE 3 10 50 Gene symbol _ Position in list | Up/downin EL __ qPCR validation 
se 20 0.5 Ahnak 21 up | Yes 
co Rbp1 25 down Yes 
2 0 0 0 0 0 Baz2a 27 up Yes 
7 EL PT Met EL PT Met EL PT Met EL PT Met EL PT Met Comp 46 down Yes 
Q Zfp408 235 u No 
8 1.5 Ip’ p 
3 Zc3hav1 50 Zfp780b 100 Nfates Ot Rxrb Lpp Zo3hav1 436 up Yes 
2 ie 4 Zfp780b 465 up | Yes 
B5 4 25 50 0.05 Nfatc3 467 up Yes 
oc 0.5 Rxrb 720 up | Yes 
3 Lpp 1030 up Yes 
2 oo 0 0 0 0 
EL PT Met EL PT Met EL PT Met EL PT Met EL PT Met 
d . 
$ 
o 150 Ar PgrB 8) PgrA&B 200 Er-a 300 Er-B Nr3c1 6 Nr3c2 
2a 200 20 
gq & 100 4 100 
mie. 100 150 10 3 
3 
e 0 ) 0 0 0 0 0 
EL PT Met EL PT Met EL PT Met EL PT Met EL PT Met EL PT Met EL PT Met 
kek 
e 8 ka kK uae 
2 
@ 10 
on 
3 
= «8 
g 
i= 
= 6 OAhnak OBaz2a @Nfatc3 ONr3c1 ONr3c2 
oO 
a4 
oO 
D 
5 H5 
zy | a 7 C1 tt fF ul 
e PinM P10nM P100nM E1inm E10nM £E100nM C1inM C10nM C100nM A1inM A10nM A100nM TinM T10nM 1T100nM 
3 
f rat 8 kK h i 2 
E ae | é 
no © oO 
3 s ¢ 
26 = © s 
3 fa 2 é # ¢ 
= DAhnak OBaz2a e@Nfatc3 F fo 
HER2 
> — 
2 Nite i EO 
D B-ACT 
& 471 cell line as 
& 
oO 
Zs 
ie) 471+P10nM 67NR+P10nM MM3MG 
+P 10nM 


Extended Data Figure 1 | Early lesion signature induction and 
expression of HER2 and PGR. a, The proportion of cancer cells 
disseminating to the bone marrow in BALB-NeuT mice decreases with 
increasing primary tumour area. DCCs were identified using anti- 
cytokeratin antibodies in bone marrow samples. The y axis displays the 
number of detected DCCs per 10° bone marrow cells divided by the 

total tumour area in mm’. The number of mice used per data point is 
written below the graph. b, Laser microdissection of epithelial structures: 
two examples of 7-9-week-old BALB-NeuT mammary glands showing 
microdissection of regions with incipient epithelial hyperplasia. For all 
samples similar amounts of tissue (up to 100,000 1m) were isolated. 

c, qPCR validation of microarray profiles. qPCR was performed for 10 
genes, upregulated or downregulated in microarray samples of early 
lesions and all changes, except one (Zfp408), were confirmed. d, qPCR 
for the mRNA level of all steroid hormone receptors (EL, early lesions; 
PT, advanced primary tumour; Met, lung metastasis). e, Primary cultures 
from mammary tissue of 7-9-week-old BALB-NeuT mice, were treated 


with different concentrations (1, 10, and 100 nM) of progesterone (P), 
oestrogen (E), aldosterone (A), cortisol (C), testosterone (T) or vehicle 
(ethanol; untreated) for 75h. Only progesterone induces upregulation of 
the complete early lesion signature. f, Increased expression of PGR-B in 
young mammary glands (9-week old BALB/c and BALB-NeuT mice with 
early lesions compared to 25-week-old BALB/c mice), but not primary 
tumours correlates with increased HER2 expression. g, Progesterone 
induces the early lesion signature in 4T1 cells (highly aggressive and 
metastatic cell line derived from a spontaneous BALB/c mammary 
tumour), but not in 67NR (tumorigenic and non-metastatic cell line 
derived from a spontaneous BALB/c mammary tumour) and MM3MG 
cells (normal mammary epithelial cell line derived from a BALB/c mouse). 
h, Progesterone treatment induces upregulation of HER2 expression 

in 4T1 cells. i, Overexpression of PGR-B in MM3MG cells induces 
upregulation of HER2 expression. ****P < 0.0001 (Student's t-test and 
Stouffer's combined probability test); data are mean + s.d. For gel source 
data, see Supplementary Fig. 1. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Progesterone regulates migration and is 
linked to branching morphogenesis. a, qPCR analysis of Pgr, Rank (also 
known as Tnfrsflla), Rankl and Wnt4 in normal and transgenic mouse 
mammary tissue or tumours. Note the increased expression of Pgr, Wnt4 
and Rankl in early lesions compared to primary tumours. Only Rank (the 
Rankl receptor) is strongly expressed in primary tumours. b, Primary 
cultures of early lesions treated with progesterone, WNT4, and RANKL. 
WNT4 and RANKL treatment induce the early lesion signature and act 
synergistically. c, d, PGR (green) staining at 5 and 12 weeks of age (FVB 
wild-type mice). Scale bars, 100|1m. The percentage of PGR* cells per 
duct was quantified (n= 2 mice per age group) in the distal and proximal 
portions of the gland (relative to the origin). PGR-expressing cells were 
more abundant in distal ducts (number of analysed ducts per group 

is displayed above each column). LN, lymph node.e, f, Photomicrograph 
of migration assay (e) and quantification of migrating cells (f) derived 


from fresh tissue (e, left) or dissociated spheres derived from primary 
tumours or early lesions (e, right; f, quantification). Progesterone, WNT4 
and RANKL induce migration of early-lesion-derived but not primary- 
tumour-derived cells (see also Fig. 2a). g, Scheme of combined migration 
and sphere assay. The lower chamber is filled with serum-free sphere 
medium and the bottom is covered with poly- HEMA to prevent adhesion 
and enable sphere formation. After 72h migration, the insert is removed 
and the lower chamber is analysed (after 11 days) for mammosphere 
formation (see Methods). h, Effect of oestrogen and progesterone on 
migration and sphere formation of mammary cells derived from early 
lesions. Cells were exposed to 10nM oestrogen or progesterone or 10nM 
oestrogen + 10 nM progesterone inhibitor (RU486). *P < 0.05; **P< 0.01; 
#2 D < 0.001; ****P < 0.0001 (Student’s t-test); Data are mean +s.e.m. 
(d) or mean + s.d. (other panels). 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | HER2 expression levels regulate migration 
and proliferation. a, Parental MM3MG cells, a cell line derived from 
mammary epithelial cells of wild-type BALB/c mice, do not express ERa 
but express low levels of HER2 and PGR-B compared to 4T1 and TUBO 
cells (TUBO cells were grown at low density; see Fig. 3 and Extended 
Data Figs 4, 5). b, Immunoblot confirming successful transduction of 

the MM3MG cell line with Her2 and Pgr-B. Note that transduction of 
PGR-B increases HER2 levels. c, d, Overexpression of Pgr-B in MM3MG 
cells (MM3MG-Pgr-B) reduces migration, whereas Her2 overexpression 
(MM3MG-Her2) increases migration of cells. Addition of progesterone 
does not alter migration of Pgr-B-overexpressing cells (MM3MG-Pgr-B). 
e, Overexpression of Pgr-B in MM3MG cells reduces sphere formation, 
whereas Her2 overexpression increases sphere formation. c-e, Migrating/ 
sphere forming cells are not from the PGR*, but the PGR7 population, 
which are responsive to PIPS. f-j, To investigate which PGR™ cells were 
the target population of progesterone signalling, we exposed parental 
MM3MG cells and Her2-transduced cells to progesterone, PIPS or mixed 
the cells with PGR* cells (only for migration experiments). Progesterone, 
WNT4 and RANKL, and co-culture with MM3MG-Pgr-B induced sphere 
formation and migration of MM3MG cells, but decreased these responses 
in MM3MG-Her? cells. k, Overexpression of Her2 increases proliferation 
of MM3MG cells (MM3MG-Her2). WNT4 and RANKL (WR) further 


increase proliferation of MM3MG-Her2 cells, but decreases proliferation 
of the parental (MM3MG) cells. Therefore based on expression of HER2, 
cells either migrate (HER2'“’~) or proliferate (HER2™8), 1, WNT4 and 
RANKL treatment induces proliferation of primary cultured cells derived 
from primary tumours, but reduces it in cells derived from early lesions. 
m, Reduction of HER2 signalling by lapatinib overrides the inhibitory 
effect of WNT4 and RANKL, and increases migration in MM3MG-Her2 
cells. However, strong inhibition of HER2 signalling reduces migration. 
n, Lapatinib inhibits HER2 signalling by preventing phosphorylation. 

o, Cells that migrated through the pores of the migration chamber insert 
were stained for HER2 (FITC, green) and PGR or ERa (Cy3, red). In 1:1 
co-culture of MM3MG-Pgr-B and MM3MG-Her2 (top) only HER2- 
expressing cells migrate. Migrated primary cells derived from early lesions 
(middle) do not express PGR and display faint HER2 staining (brightness 
of HER2 and PGR staining increased by 50% for better visibility). HER2 
and PGR double-positive T47D cells fixed onto the filters of migration 
chambers served as positive control of staining. m-o, Cells with low/ 
intermediate signalling of HER2 show the highest response in migration 
and sphere formation induction by PIPS. *P < 0.05; **P < 0.01; 

*** P< 0.001; ****P < 0.0001; NS, not significant; (F-test of the slope 

(k, 1) or Student's t-test (other panels)); data are mean + s.d. For gel source 
data, see Supplementary Fig. 1. 
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Extended Data Figure 4 | Cell density and regulation of PGR and HER2 
signalling. a, PGR expression silenced in tumours can be re-activated in 
culture and re-silenced in vivo. b, PGR re-expression only occurs in TUBO 
cells grown at low density and high density after frequent medium change. 
c, Downregulation of PGR in early lesion cells cultured in a transwell assay 
next to primary tumour cells suggests the existence of a secreted factor 
passing through the membrane of the transwell insert and downregulating 
Pgr mRNA (left) and protein (right). d, T47D cells exposed to conditioned 
medium from TUBO cells display reduced PGR mRNA (left) and protein 
(right). e, Exosomes derived from the cell culture medium of TUBO cells 
grown at high density (exosome fraction) induce downregulation of PGR 
in T47D cells. f, miRNA sequencing to identify PGR regulating miRNAs. 
Left, top 10 upregulated miRNAs in Her2-overexpressing cells (MM3MG- 
Her2) compared to control (MM3MG). Middle, top 10 expressed miRNAs 
in TUBO cells and TUBO-cell-derived exosomes. Right, miRNAs 
predicted by the miRANDA web software to regulate Per g, Among all 
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candidate miRNAs only miR-30a-5p and miR-9-5p induce downregulation 
of Pgr mRNA in T47D cells. h, Downregulation of PGR in T47D cells 
treated with miR-30a-5p and miR-9-5p. i, Expression of miR-30-5p in 
early lesions and primary tumour samples compared to mammary glands 
from 8-week-old BALB/c mice. j, Density induced upregulation of HER2 
in TUBO cells grown at low or high density and early lesions compared 

to primary tumour samples (left) and progesterone-responsiveness of 
low-density TUBO cells (right). Note that levels of HER2, PGR, the 
glucocorticoid (GR), mineralocorticoid (MR) and androgen receptors 
(AR) are regulated by progesterone in a dose-dependent manner except 
ER-a (see related result in Extended Data Fig 1d). k-m, TUBO cells grown 
at low density and exposed to progesterone or PIPS migrated more (k, 1) 
and produced more spheres (m), similar to cells derived from early lesions 
(see Fig. 2). **P < 0.01; ***P < 0.001; ****P < 0.0001(Student’s f-test); 
data are mean + s.d. For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Cell density and regulation of PGR and HER2 
signalling in human cell lines. To investigate whether human breast 
cancer cells display similar regulatory circuits as found in mouse cells, we 
selected 16 cell lines of different breast cancer subtypes. a, HER2 mRNA 
expression levels in 15 human cell lines compared to the hTERT-HME 
cell line. Different colours indicate the subtype of breast cancer of each 
cell line. b, The expression of miR-9-5p in human breast cancer cell lines 
compared to hTERT-HME cells. Note that cell lines highly expressing 
HER2 (see a) express more miR-9-5p similar to primary tumours of 
BALB-NeuT and TUBO cells (see Fig. 3b, c), whereas two HER2'8h/ 
PGR'8 cell lines, BT474 and T47D, do not express miR-9-5p similar to 
human HER2"84/PGR"™8" samples (see Fig. 5d). c, High cell density leads 
to upregulation of HER2 mRNA (top) or protein levels (bottom) in several 
cell lines. Only four cell lines were analysed for protein level (HER2 level 
was not influenced by cell density in CAMA1; HER2 level regulated by 
cell density in HCC1806, MDA-MB-231 and MCF7). Numbers below the 
blots denote fold change of HER2 in high density compared to low density 
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normalized over 3-actin. d, Expression of miR-9-5p is upregulated by cell 
density in SKBR3, HCC1937, HCC1806 and MCF7 cell lines. e, Migration 
and sphere-forming potential of 10 out of 16 cell lines grown at low and 
high densities, and treated with PIPS or progesterone. The first 7 cell lines 
regulate HER2 transcripts by density (see c) and their response to PIPS is 
similar to TUBO cells and primary mammary cell cultures of BALB-NeuT 
mice (see Fig. 2 and Extended Data Fig. 41, m). The remaining three cell 
lines do not regulate HER2 transcripts by cell density, but respond to 
progesterone similarly to the TUBO cell line and primary mammary cell 
cultures of BALB-NeuT mice (see Fig. 2 and Extended Data Fig. 41, m). We 
did not perform functional assays with BT549 (triple negative subtype), 
T47D (luminal, MCF7-like), MDA-MB-175, ZR75-1 (luminal, CAMA1- 
like), hTERT-HME (transformed normal, similar to MCF10A) because 

of breast cancer subtype redundancy or poor growth (HCC1569). y axes 
show the percentage of migrating cells (left) and observed spheres 

(right) relative to seeded cells. Data are mean + s.d. For gel source data, 
see Supplementary Fig. 1. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Differentiation ability and metastasis 
formation. a, b, Representative images (a) and quantification (b) of 

PGR expression of mammary epithelial cells from wild-type BALB/c 
mice at 4, 8, 25 and 40 weeks of age. Scale bars, 100 1m. PGR expression 
was reduced by 75% in 40-week-old wild-type mammary gland compared 
to 4-week-old mice and disappeared in primary tumours (see also 
Extended Data Figs If, 2a). n, number of ducts or glands (in early 

lesions and normal tissue) or visual fields in primary tumours. 

c, d, Representative micrographs of lesions 8 weeks after transplantation 
of early lesion spheres resembling DCIS (c) or less-advanced early lesions 
(d) displaying PGR expression (brown nuclear staining). e, Tumour 
growth from primary tumourspheres in young and old recipients. 

f, Differentiation of cells from early lesions but not of primary tumour cells 
in Matrigel (left) or in sphere culture (right) into acinus-like structures. 
Progesterone stimulation accelerated formation of acinus-like structures 
by early lesion cells, under mammosphere conditions. NED, no evidence 
of differentiation. g, Staining for CK8/18, PGR and EpCAM (epithelial 
cell adhesion molecule) shows expression of PGR and CK8/18 only in 
differentiated structures (top) compared to undifferentiated spheres 
(bottom). h, i, Primary tumourspheres were transplanted alone (n = 23) 
or co-transplanted with MM3MG-Pgr-B (n=5) or MM3MG spheres 
(n=5). DCCs numbers in bone marrow (h) and the number of mice with 
tumours (i) were checked 4 weeks later. Pgr-B-transduced mammary 
epithelia suppressed metastatic dissemination and accelerated tumour 
formation from primary tumourspheres. j, Pregnancy at the early lesion 
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stage induces dissemination. A group of young BALB-NeuT mice mated 
(n=5) at the early lesion stage (7-week-old) and were killed at the end of 
pregnancy. These mice did not form palpable tumours, but had a higher 
number of DCCs compared to unmated control mice (n = 6). k, Pregnancy 
at the advanced tumour stage. A group of BALB-NeuT mice (n = 5) were 
mated at the time of in situ carcinoma (15-week-old) and killed at the end 
of pregnancy. All pregnant mice had faster growing tumours compared 

to unmated control mice. 1, Schematic of transplantation protocol for 
mammary gland or primary tumour tissue pieces into wild-type recipients. 
m, Example of primary tumour and macro-metastasis assessment. 

n, Number of metastatic foci in transplanted mice. 18 mice from the gland 
model and 3 mice from the tumour model were excluded from analyses 
owing to the fusion of metastatic lesions, which made it difficult to count 
individual lesions. o, Similar growth kinetics of primary tumours from 
gland and tumour piece models for samples from the red box in Fig. 4e. 

p, q, Mice from o were compared for the duration of the follow-up period 
after surgery. Mice from both groups were killed at the first signs of 
general health deterioration, which occurred earlier in gland-model mice 
(p). Longer follow-up time after curative surgery did not result in more 
metastases in recipients transplanted with primary tumour pieces (q). 
*P<0.05; **P<0.01; ***P < 0.001; ****P < 0.0001; NS, not significant; 
(Student's t-test (b, f, h, j); Fisher’s exact test (i); F-test for the slopes (c, k); 
Mann-Whitney U-test (n-q)). Data are shown as mean + s.d. (b, e, f, k) or 
median (h, j, n—-q). 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Array CGH analysis of primary tumour- 
metastasis pairs. a, Number of aberrations detected by array comparative 
genomic hybridization in primary tumours and matched lung metastases. 
Dot plot with median; statistical analysis by Mann-Whitney U-test. 

b, Heatmap of copy number states for the 28 primary tumours and 

44 matched metastases across chromosomes 1-19 and X. Light, medium 
and dark yellow or blue colours indicate weak, intermediate and strong 
amplification (yellow) or deletion (blue) amplitudes, respectively 
(thresholds at + 0.1, + 0.2, + 0.3). c, Prototype aberrations (top) 
constructed from segmented array CGH profiles (bottom) of the primary 
tumour (PT) and the matched metastases (Met 1-3) of mouse 3769 
(phylogenetic tree and phylogenetic paths displayed in Fig. 4h, i). 
Prototypes (top) are organized in stacked rows per chromosome and 
numbered according to chromosome and positional order of their 
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first change point, for example, 1.2 denotes the second prototype of 
chromosome 1. These prototype aberrations are then used to construct 
the phylogenetic paths (for example, Fig. 4i) and trees. For better visibility, 
small focal aberrations were enlarged to have a minimal extension of 300 
probes. Yellow, amplification (+1); blue, deletion (—1). Corresponding 
segmentation profiles of the normalized and wavy-pattern-corrected 
array CGH data (grey dots) are indicated by red lines (bottom). For 
segmentation and prototype construction, see Methods. d, Table for 
calculating the relative time points of dissemination (Fig. 4j). PT ab, 
number of aberrations in the primary tumour; M; ab, number of 
aberrations in the matched metastases (k = 1, 2, 3); PT-M;, cab, number of 
common aberrations between primary tumours and metastases; PT-M, 
pcab, proportion of common aberrations relative to the primary tumour, 
that is, pcab = cab/PT ab. 
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Extended Data Figure 8 | Phylogenetic analysis of metastasis (early 
divergence). Phylogenetic trees of the top 13 out of 28 primary tumours 
and matched metastasis samples listed according to earliest time point of 
dissemination (for details see tables in Extended Data Figs 7, 9). Al-7, 
inferred common ancestors (intermediates); M, metastases (M1-3); 

N, normal cell; P, primary tumour. (see Methods). The ordinate indicates 
the number of aberrations per profile (on a square root scale). 

For the first three matched samples, the phylogenetic tree paths 


(middle), prototype aberrations (top right) and segmented array CGH 
profiles (bottom right) are also shown in addition to the phylogenetic tree 
(left). Aberration profiles along phylogenetic paths run from N via Al-7 
to P or M1-3. Aberration prototypes are named according to chromosome 
and positional order of their first change point, for example, 2.2 denotes 
the second aberration prototype of chromosome 2 (see Methods and 
Extended Data Fig. 7c). 
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Extended Data Figure 9 | Phylogenetic analysis of metastasis 


(late divergence). a, Phylogenetic trees of the top 14 out of 28 primary 
tumour and matched metastasis samples listed according to latest time 
point of dissemination. For the first three mice the phylogenetic paths 


(mid), prototype aberrations (top right) and segmented array CGH 
profiles (bottom right) are also shown next to the phylogenetic tree 


(left). See Methods and Extended Data Figs 7, 8. b, Summary table of all 
phylogenetic analyses indicates the position of the corresponding mouse 
phylogenetic tree in Extended Data Figs 8 and 9 (EDF8 and EDF9) for each 
primary tumour-metastasis pair. The two bottom rows indicate the rank 


and the corresponding relative time point of dissemination as measured 


metastasis 


by the proportion of aberrations shared between primary tumour and 


PT-M pcab; see Extended Data Fig. 7 and Fig. 4j). Note that 


only metastases ranked on position 36-44 diverged late. The phylogenetic 


tree and phylogenetic paths for mouse 3769 are displayed in Fig. 4h, i. In 
the pos-EDF8 and pos-EDF9 rows the darker colours are samples of which 


shown. 
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all data, including phylogenetic paths, prototype aberrations, segmented 
array CGH profiles and phylogenetic trees are shown. Faint colours cells 
in pos-EDF8 and pos-EDF9 are samples where only phylogenetic trees are 
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Extended Data Figure 10 | PGR, HER2 signalling and dissemination in indicate, double-positive (DP), double-negative (DN) and single-positive 
breast cancer patients. a, Double staining of a HER2"8"/PGR"8" human (SP) cells. b, Lack of PGR expression in high-density areas of HER284/ 
breast cancer sample (PGR, brown, nucleus; HER2, red/pink, membrane). PGR"86_classified tumour samples (see Fig. 5c) is directly linked to high 
Cells with varying expression levels of HER2 and PGR, as well as negative, = miR-9-5p and miR-30a-5p expression. Data are mean + s.d. 

single- or double-positive cells can be seen. Scale bar, 100j1m. Arrows 
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Evidence for a spinon Fermi surface in a triangular- 
lattice quantum-spin-liquid candidate 


Yao Shen!, Yao-Dong Li’, Hongliang Wo!, Yuesheng Li’, Shoudong Shen!, Bingying Pan!, Qisi Wang!, H. C. Walker’, 
P. Steffens°, M. Boehm®, Yiqing Hao!, D. L. Quintero-Castro®, L. W. Harriger’, M. D. Frontzek®, Lijie Hao’, Siqin Meng?, 


Qingming Zhang®!°!, Gang Chen! & Jun Zhaob!! 


A quantum spin liquid is an exotic quantum state of matter in 
which spins are highly entangled and remain disordered down to 
zero temperature. Such a state of matter is potentially relevant to 
high-temperature superconductivity and quantum-information 
applications, and experimental identification of a quantum spin 
liquid state is of fundamental importance for our understanding 
of quantum matter. Theoretical studies have proposed various 
quantum-spin-liquid ground states'*, most of which are 
characterized by exotic spin excitations with fractional quantum 
numbers (termed ‘spinons’). Here we report neutron scattering 
measurements of the triangular-lattice antiferromagnet YoOMgGaO, 
that reveal broad spin excitations covering a wide region of the 
Brillouin zone. The observed diffusive spin excitation persists at 
the lowest measured energy and shows a clear upper excitation 
edge, consistent with the particle-hole excitation of a spinon Fermi 
surface. Our results therefore point to the existence of a quantum 
spin liquid state with a spinon Fermi surface in YoMgGaOx, which 
has a perfect spin-1/2 triangular lattice as in the original proposal* 
of quantum spin liquids. 

In 1973, Anderson‘ proposed the idea of a quantum spin liquid 
(QSL) in the study of the triangular-lattice Heisenberg antiferromagnet. 
This idea was revived after the discovery in 1986 of high-temperature 
superconductivity*. A QSL, as currently understood, does not fit into 
Landau’s conventional paradigm of symmetry-breaking phases’*”, 
and is instead an exotic state of matter characterized by spinon excita- 
tions and emergent gauge structures'**, The search for QSLs in models 
and materials*” has been partly facilitated by the Oshikawa—Hastings— 
Lieb-Schultz—Mattis (OHLSM) theorem, which hints at the possibility 
of QSLs in Mott insulators with odd electron fillings and a global U(1) 
spin rotational symmetry'*-'’. Indeed, a continuum of spin excitations 
has been observed in the kagome-lattice material ZnCu3(OD)¢Cl 
(refs 12, 16). However, the requirement of the U(1) spin rotational 
symmetry prevents the application of the OHLSM theorem in 
strongly spin-orbit-coupled Mott insulators, in which the spin 
rotational symmetry is completely absent. A recent theory addressed 
this limitation of the OHLSM theorem, arguing that, as long as time- 
reversal symmetry is preserved, the ground state of a spin-orbit- 
coupled Mott insulator with odd electron fillings must be exotic!’. 

The triangular antiferromagnet YbMgGa0O,j (refs 18, 19) displays no 
indication of magnetic ordering or symmetry breaking at temperatures 
as low as 30 mK, despite the energy scale for spin interaction being 
equivalent to a temperature of approximately 4 K. Because of the strong 
spin-orbit coupling of the Yb electrons, YbMgGaO, was the first QSL 
to be proposed that was unlike those in the OHLSM theorem’. The 


thirteen 4f electrons of the Yb** ion form the spin-orbit-entangled 
Kramers doublets that are split by the D3, crystal electric fields*”-*”. 
At temperatures considerably lower than the crystal field gap (about 
420 K), the magnetic properties of YoMgGaQO, are captured by the 
ground-state doublet that is described by an effective spin-1/2 local 
moment; this is confirmed by a measured magnetic entropy of RIn(2) 
per Yb?+ ion!8, where R is the ideal gas constant. Figure 1a, b shows that 
the YbO, octahedra form well-separated triangular layers. Because of 
the large difference in chemistry between Yb** and the non-magnetic 
Mg?*/Ga** ions, intra-triangular-layer impurities are prevented 
in YbMgGaO,j (refs 18, 19, 21). Hence, the Yb system is a spin-1/2 
antiferromagnet on a perfect triangular lattice. 
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Figure 1 | Crystal structure and magnetic susceptibility of a single 
crystal of YyMgGaQg. a, Schematic of the YoMgGa0O, crystal structure. 
The dashed line indicates the unit cell. b, A triangular layer of Yb** 
ions and oxygen. c, Direct-current magnetic susceptibility y measured 
under zero-field cooling (ZFC) and field cooling (FC) for single crystals 
of YbMgGaOg, under magnetic fields (H = 1 T) applied perpendicular 
and parallel to the c axis. Paramagnetic behaviour is observed at low 
temperature with no obvious differences between ZFC and FC data. The 
inset shows the inverse susceptibility 1/ at low temperature (<20K), 
fitted with the Curie-Weiss law (dashed line). The fitting results in Curie 
temperatures of Ocw,; = —4.78 K and Ocw, = —3.2 K for perpendicular 
and parallel magnetic fields, respectively. 
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Figure 2 | Measured and calculated momentum dependence of the spin 
excitations, and calculated spinon Fermi surface of YyMgGaQy,. 

a-e, Constant-energy images at the indicated energies and T=70 mK, 
displaying diffusive magnetic excitations covering a wide region of the 
Brillouin zone. The scattering intensity is represented by the colour scale, 
and inc, d and e has been multiplied by 2, 4 and 8, respectively, for clarity. 
The data were collected on ThALES using the Flatcone detector, and were 
corrected for neutron-beam self-attenuation (Methods). f, Calculated 
momentum dependence of the spin excitations for a typical, finite E. 


To characterize the behaviour of the local moment of Yb, we first 
measured the magnetic susceptibility of single-crystalline YoMgGaO4 
(Fig. 1c). For magnetic fields H applied both parallel to and normal to 
the c axis of the lattice, we found predominantly antiferromagnetic 
spin interactions, as evidenced by negative Curie-Weiss temperatures 
(Fig. 1c, inset). Because of the anisotropic spin interaction, the Curie- 
Weiss temperatures for H | c and H || c were not identical (Fig. 1c, 
inset; Extended Data Fig. 1f), with Ocw,; = —4.78 K and Ocw, = 
—3.2K, consistent with previous measurements'*’*. We examined 
the magnetic susceptibilities in field cooling (FC) and zero-field 
cooling (ZFC) measurements. No splitting was detected between the 
FC and ZFC results down to 2K, indicating the absence of spin glassy 
transitions (Fig. 1c). 

The Curie-Weiss temperature and the spin excitation bandwidth 
(discussed below) set the energy scale for the spin interactions. Our 
elastic neutron scattering measurements revealed no magnetic Bragg 
peaks (Extended Data Fig. 2) at temperatures as low as 30 mK, con- 
siderably lower than the Curie-Weiss temperature (about 4K) and 
spin excitation bandwidth (about 17K); this is consistent with pre- 
vious measurements of specific heat and susceptibility. To reveal the 
intrinsic quantum dynamics of the local moments of Yb, we used 
inelastic neutron scattering (INS) to study the spin excitations in sin- 
gle crystals of YoYMgGaO, at approximately 70 mK. Constant-energy 
images are presented in Fig. 2a—e, which indicate the presence of dif- 
fusive magnetic excitations for all measured energies. The scattering 
spectral weights are spread broadly in the Brillouin zone, whereas 
the spectral intensities near the zone centre (that is, the T point) are 
suppressed. For a low-energy transfer of 0.3 meV, the spectral intensity 
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Here t is the hopping amplitude between nearest-neighbour sites. 

g, Spinon Fermi surface calculated using the model described in the 
main text. The black arrow indicates a spinon particle-hole excitation 
with momentum transfer p and dashed lines indicate the Brillouin zone 
boundaries of the conventional unit cell (a= b = 3.40 A, c= 25.12 A). 
High-symmetry points M, K and [ are labelled by red, green and blue 
dots, respectively. The wave vector Q is defined as Q= Ha" + Kb" + Lc’; 
a.u., arbitrary units; r.l.u., reciprocal lattice units. 


is slightly more pronounced around the M points, while the broad con- 
tinuum across the Brillouin zone still carries the vast majority of the 
spectral weight (Fig. 2a). 

Figure 3a displays a contour plot of spectral intensity along the 
high-symmetry momentum directions (M-K-I-M-T) in energy- 
momentum (E-Q) space. Similarly to the constant-energy images 
shown in Fig. 2a-e, the spectral intensity is broadly distributed 
in momentum for all of the energies measured. Moreover, a clear 
V-shaped upper bound on the excitation energy is evident near the 
T point (Fig. 3a, dotted line). The intensity of the spin excitation 
gradually decreases with increasing energy, and vanishes above 
approximately 1.5 meV. This feature is confirmed by the Q scans in 
Fig. 4a, b and the E scans at a few given momentum points (I, M and K) 
in Fig. 4c. 

The broad continuum is an immediate consequence, and strong 
evidence, of spinon excitations in QSLs!”"”. This differs from magnon- 
like excitations that would peak strongly at specific momenta in recip- 
rocal space, with or without static magnetic order”**. In general, the 
spinful excitations in QSLs are carried by deconfined spinons!’. For 
most experimentally relevant QSLs, the spinons carry half-integer 
spins. One neutron-spin-flip event in an INS measurement creates an 
integer spin change that necessarily excites two (or more) spinons'. 
Therefore, the energy transfer E and momentum transfer p of the 
neutron are shared by two spinon excitations that are created by 
the neutron spin flip. According to energy-momentum conservation, 
we have E(p) = w,(k) + w,(p — k), where w,(k) is the spinon dispersion 
and k is the momentum of one spinon. This relation implies the 
presence of an excitation continuum in the INS spectrum. The broad 
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Figure 3 | Intensity of the spin-excitation spectrum along the high- 
symmetry momentum directions. a, Contour plot of the intensity along 
the (1/2 — K/2, K, 0) and (1, K, 0) directions illustrated by the black 

lines in b. The y axis represents the energy transfer. Vertical dashed lines 


continua in Figs 2a-e and 3a at different energies are as expected for 
the continuum excitations of two spinons. 

The broad neutron-scattering spectral intensity that persists to the 
lowest energy that we measured suggests a high density of spinon 
scattering states at low energies. This cannot be explained by a Dirac 
QSL?”5, in which the spectral continuum at low energies would con- 
centrate near a few discrete momenta that connect the Dirac cones 
(Methods), or by any simple gapped QSL. Because of the gap, the 
spectral intensity would exceed a specified energy threshold. Even if 
the gap was smaller than the lowest measured energy, the spinon excita- 
tions would, except under special circumstances'®, occupy only one 
or a few discrete spots in reciprocal space at low energies”®, gradually 
expanding with increasing energy; a broad continuum at all energies 
and diminishing spectral weight at I’ (Fig. 3a) would not be observed. 
Moreover, Dirac and gapped QSLs are inconsistent with the observed 
low-temperature sublinear power-law behaviour of the heat capacity’. 
In contrast, the spinon-Fermi-surface QSL, with a high density of 
spinon states near the spinon Fermi surface, provides a consistent 
explanation for the INS results of YoMgGaOu. 

To account for these possible QSL signatures in YoHMgGaOu, we 
consider a minimal mean-field spinon Hamiltonian with a uniform 
spinon hopping on the triangular lattice. With a zero background flux 
for the spinons, the spinons form a large Fermi surface in the Brillouin 
zone (Fig. 2g). Although the anisotropic spin exchange caused by 
the spin-orbit coupling!*-*! breaks the spin-rotational symmetry of 
this simple model, the mean-field state considered here captures the 
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bounds on the energy of the spin excitations. b, Sketch of reciprocal space. 
Dashed lines indicate the Brillouin zone boundaries. 


essential properties of the spinon-Fermi-surface QSL in this system. 
For this spinon Fermi surface state, one neutron spin flip excites one 
spinon particle-hole pair across the Fermi surface. Therefore, the 
dynamic spin structure factor S(p, E), measured by INS, directly 
probes the spinon particle-hole excitations across the spinon Fermi 
surface (Fig. 2g). 

For low E, a minimum momentum transfer Pmin ¥ E/vp, where vr is 
the Fermi velocity, is required to excite the spinon particle-hole pairs. 
Therefore, the spectral intensity near the I’ point should be gradually 
suppressed with increasing energy, leading to an upper bound on the 
excitation energy near the I point, as is clearly observed in Fig. 3a 
(V-shaped dotted line). For a typical, finite E, the calculated spectrum 
based on the spinon particle-hole continuum is shown in Fig. 2f. This 
spectrum is qualitatively consistent with the experimental observation 
of the broad spinon continuum in reciprocal space. Finally, when 
E exceeds the spinon bandwidth, the single spinon particle-hole 
excitation process is suppressed, and the spinon excitation intensity is 
suppressed accordingly. This feature is consistent with the vanishing of 
the spectral intensity above about 1.5 meV (dotted line) seen in Figs 3a 
and 4c. Therefore, we propose that YbMgGa0O,j is a QSL with a spinon 
Fermi surface. 

The spinon Fermi surface alone has a constant density of states 
and would give a heat capacity that depends linearly on temperature 
(C, x T). To account for the C, « T?/? behaviour in YbMgGaO,”, we 
further propose that the candidate QSL is a spinon-Fermi-surface U(1) 
QSL, where the strong U(1) gauge fluctuation invokes a self-energy 
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Figure 4 | Constant-energy scans along the symmetry directions and 
constant-Q scans at the high-symmetry points. a, b, Constant-energy 
scans along the (1/2 — K/2, K, 0) and (1, K, 0) directions. The solid lines 
are guides to the eye. c, Constant-Q scans at the M, K and points with 


the final energy E; fixed at 3 meV, 3.5 meV and 4 meV, as indicated. 
The sharp upturn of the scattering below about 0.1 meV is due to 
contamination from incoherent elastic scattering at E=0 meV (dashed 
line, E-=3 meV). Error bars, 1 s.d. 
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correction in the spinons, thus enhancing the low-energy density of 
states*”??, 

During the review of the Letter, a related preprint’ appeared that 
discusses the role of the next-nearest-neighbour interaction in the 
formation of the QSL state in YoMgGaOu. 
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METHODS 


Sample growth and characterizations. High-quality Yb MgGaQ, single crystals 
were synthesized using the optical floating zone technique!”. A representative 
single crystal, which is optically transparent with mirror-like cleaved surfaces, is 
shown in Extended Data Fig. la. Our X-ray diffraction (XRD) measurements 
revealed that all of the reflections from the cleaved surface could be indexed by 
(0, 0, L) peaks of triangular YbMgGaO,; no impurity phases were observed 
(Extended Data Fig. 1b). The full-width at half-maximum (FWHM) of the rocking 
curve of the (0, 0, 18) peak was about 0.009°, indicating an extremely high crystal- 
lization quality (Extended Data Fig. 1c). This was confirmed by the sharp and clear 
diffraction spots in the X-ray Laue pattern (Extended Data Fig. 1d). Powder XRD 
patterns on ground single crystals also revealed no indication of impurity phases 
(Extended Data Fig. le). The Rietveld refinements*! confirm that the XRD pattern 
can be described by the R3m space group. The refined structural parameters are 
given in Extended Data Table 1. These results suggested that the YoMgGa0O, single 
crystal possessed a perfect triangular lattice with no detectable impurities. This is 
consistent with previous measurements that have demonstrated that the impurity/ 
isolated spins are less than 0.04% in similar samples'*!°. Although the Mg/Ga site 
disorder in the non-magnetic layers does not directly affect the exchange interac- 
tion between the Yb local moments, it may have an indirect effect and could lead 
to some exchange disorder. It seems that this disorder is not significant, because 
no signs of spin freezing were observed. A QSL is often stable against weak local 
perturbations, provided that the perturbation is irrelevant or not significant. 
Therefore, if a QSL is realized as the ground state for YbyMgGaO,, then the possible 
exchange disorder will not destabilize this state if the disorder strength is not 
significant. 

In addition, the field dependence of magnetization in our single crystal 
displayed a linear behaviour above 12 T (Extended Data Fig. 1f), indicative 
of a fully polarized state. The Van Vleck susceptibility extracted from the 
linear-field-dependent magnetization data was subtracted in the inset of Fig. 1c. 
Neutron scattering experiments. INS measurements were carried out on the 
ThALES cold triple-axis spectrometer at the Institut Laue-Langevin, Grenoble, 
France, and at the FLEXX cold triple-axis spectrometer in the BER-II reactor at 
Helmholtz-Zentrum Berlin, Germany. For the THALES experiment, silicon (111) 
was used as a monochromator and analyser; the final neutron energies were fixed at 
Ers=3 meV (energy resolution of about 0.05 meV), Er=3.5 meV (energy resolution 
of about 0.08 meV) or Er=4 meV (energy resolution of about 0.1 meV). For the 
FLEXX experiment, pyrolythic graphite (002) was used as a monochromator and 
analyser. Contamination from higher-order neutrons was eliminated through a 
velocity selector installed in the front of the monochromator. The final neutron 
energy was fixed at Er= 3.5 meV (energy resolution of about 0.09 meV). Three (six) 
pieces of single crystals with total a mass of about 5 g (19 g) were coaligned in the 
(HK0) scattering plane for the ThALES (FLEXX) experiment. The FWHM of the 
rocking curve of the coaligned crystals for the ThALES and FLEXX experiments 
were approximately 0.95° and 0.92°, respectively. The elastic neutron scattering 
experiment was carried out at the WAND neutron diffractometer at the High 
Flux Isotope Reactor, Oak Ridge National Laboratory, USA; one single crystal was 
used for the experiment, with the incident wavelength \= 1.488 A (Extended Data 
Fig. 2). For the low-temperature experiments, a dilution insert for the standard “He 
cryostat was used to reach temperatures down to around 30-70 mK. 

Because of the non-uniform shape of the single crystal, the relatively large 
sample volume and the extremely broad spin-excitation spectrum, the neutron 
beam self-attenuation (by the sample) may require consideration. In most cases 
the self-attenuation is dependent on only the distance traversed by the neutrons 
through the sample. We observed the self-attenuation effect in an elastic incoherent 
scattering image of our sample at 20 K, which exhibited an anisotropic intensity 
distribution (Extended Data Fig. 3a). The self-attenuation effect was also observed 
in the raw constant-energy images (Extended Data Fig. 3b-f), which were shown 
to be anisotropic, with slightly higher intensities occurring at approximately 
the same direction as that observed in the elastic incoherent scattering images. 
The self-attenuation can be corrected by normalizing the data with the elastic 
incoherent scattering image; that is, the elastic incoherent scattering intensity, 
which is dependent on the sample position (w) and scattering angle (20), is 
converted to a linear attenuation correction factor for the scattering images 
measured at different energies. The normalized constant-energy images are 
presented in Fig. 2a-e, revealing a nearly isotropic intensity distribution. 

Extended Data Fig. 4 shows the spin excitation spectrum at 20 K, which is 
broadened and weakened compared with that at 70 mK (discussed below). 
Spinon Fermi surface and dynamic spin structure factor. Here we explain the 
spinon mean-field state that is used to explain the dynamic spin structure factor 
of the neutron scattering experiments. As we proposed in the main text, a QSL 
with a spinon Fermi surface gives a compatible explanation for the INS results 
for YbMgGaOy. 
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To describe the candidate spinon-Fermi-surface QSL state in YoMgGaQu, we 
formally express the Yb** effective spin as the bilinear combination of the 
fermionic spinon with spin $;= )),, she Oagf,g and a Hilbert space constraint 
ve ve fin=)s where og is a vector whose three components are the Pauli matrices 
and i (fia) creates (annihilates) a spinon with spin a =f, | at site i. For the QSL 
with a spinon Fermi surface, we propose a minimal mean-field Hamiltonian Hypr 
for the spinons on the triangular lattice. We consider a uniform spinon hopping 
with a zero background flux: 


Hyer =~ #90 (fa fa +h) = HY fa Sin (1) 
i 


where t is the mean-field parameter, which represents the hopping amplitude 
between nearest-neighbour sites. The chemical potential jx is included to impose 
the Hilbert space constraint on average. Here, we have treated the spinons freely 
by neglecting the gauge fluctuations. This mean-field state gives a single spinon 
dispersion 


w= — t>> cos(k- aj) — ps 
{aj} 


where {aj} are six nearest-neighbour vectors of the triangular lattice. Owing to the 
Hilbert space constraint, the spinon band is half-filled, leading to a large Fermi 
surface in the Brioullin zone (Extended Data Fig. 5a). 

INS measures the dynamic spin structure factor 


S(p,E) = > eivter-n) f e-it'(6;() -S*(0))at 
ij 
(2) 
=. &(E — [En(p) — Eol)|(n]Sp |) [ 


where N is total number of lattice sites, the summation goes over all eigenstates, 
|{2) refers to the spinon ground state with the spinons filling the Fermi sea, Ep is 
the energy of the ground state and E,,(p) is the energy of the nth excited state with 
momentum p. In the actual calculation, owing to the energy resolution of the 
experiments, the 6 function is taken to have a broadening: 


nl ® 


EO eae 


: tion gt 
where 77 is the broadening and is the measured energy. Because $, =, Jpeg ' Sp 


the summation in equation (2) would be over all possible spin-1 excited states that 
are characterized by one spinon particle-hole pair crossing the spinon Fermi 
surface (Fig. 2g) with a total momentum p and a total energy E. As we show in 
Fig. 2f and Extended Data Fig. 5b, and discuss in the main text, this spinon-Fermi- 
surface QSL state gives the three crucial features of the INS results: (1) the broad 
continuum that covers the large portion of the Brioullin zone; (2) the broad 
continuum persisting from the lowest energy transfer to the highest energy 
transfer; and (3) the clear upper excitation edge near the I’ point. 

In our calculation of Fig. 2f and Extended Data Fig. 5b, we choose the lattice 
size to be 40 x 40 and 7)= 1.2t, in accordance with the energy and momentum 
resolution of the instruments. The energy scale of Fig. 2f is set to be 7.5t. 

Here we explain the details of the dynamic spin structure factor in Fig. 2f and 
Extended Data Fig. 5b, based on the particle-hole excitation of the spinon Fermi 
surface. For an infinitesimal energy transfer, the neutrons simply probe the spinon 
Fermi surface. Because the spinon particle and hole can be excited anywhere near 
the Fermi surface, the neutron spectral intensity appears from p=0 to p= 2k, 
where ky is the Fermi wavevector. Because |2k,| already exceeds the first Brillouin 
zone, the neutron spectral intensity then covers the whole Brillouin zone including 
the [ point. For a small but finite E, as we explain in the main text, a minimal 
momentum transfer Pmin* E/vr is required to excite the spinon particle-hole 
pairs. Therefore, the spectral intensity gradually moves away from the I point as 
E increases. Because it is always possible to excite the spinon particle-hole pair with 
the momenta near the zone boundary, the spectral intensity is not greatly affected 
at the zone boundary as E increases. Thus, the broad continuum continues to cover 
a large portion of the Brillouin zone at a finite E. 

With the free spinon mean-field model Hyyry, we further calculate the spectral 
weight along the energy direction for fixed momenta. The discrepancy between the 
theoretical results in Extended Data Fig. 5d and the experimental results in Extended 
Data Fig. 5e occurs at low energies. We attribute this low-energy discrepancy to 
the fact that the free spinon theory ignores the gauge fluctuations. The enhance- 
ment of the low-energy spectral weight compared to the free spinon results is 
then identified as possible evidence of strong gauge fluctuations in the system; 
we elaborate on this in the following discussion of the heat capacity behaviour. 
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To account for the heat capacity behaviour, we suggest that the candidate 
QSL is a spinon-Fermi-surface U(1) QSL. Here we elaborate on this point and 
discuss the U(1) gauge fluctuation of this state in detail. The stability of U(1) QSLs 
with a spinon Fermi surface against the spinon confinement has been addressed 
extensively**4, It was proposed and understood that the large densities of gapless 
fermionic spinons on the spinon Fermi surface help to suppress the instanton 
events of the compact U(1) gauge field in two-dimensional U(1) QSLs**34. The 
proliferation of instanton events is the cause of the gauge confinement of a 
U() lattice gauge theory for a U(1) QSL without gapless spinons*°. Because the 
instanton event is suppressed here, the compactness of the U(1) gauge field is no 
longer an issue, and the low-energy property of our U(1) QSL is then described 
by gapless fermionic spinons coupled with a non-compact U(1) gauge field?”*3°°, 
Owing to the coupling to the gapless spinons, the U(1) gauge photon is overly 
Landau damped and becomes very soft. The soft gauge photon strongly scatters the 
fermionic spinons, provides a self-energy correction to the Green's function of the 
spinon, and thus makes the quasi-particle weight for spinons equal to zero?”?3°6, 
The resulting spinon non-Fermi liquid state has an enhanced density of low-energy 
spinon states that results in a sublinear power-law temperature dependence 
for the low-temperature heat capacity?””?""°, In addition to the heat capacity 
behaviour, we find that, owing to the spinon-gauge coupling and the U(1) gauge 
fluctuation, the enhanced density of the low-energy spinon states is consistent 
with the enhanced spectral intensities at low energies for the fixed momenta in 
Extended Data Fig. 5e. 

The stability of the spinon non-Fermi liquid against spinon pairing has also 
been considered theoretically*’. When the spinons pair up, similar to the Cooper 
pairing of electrons in a superconductor, the continuous part of the U(1) gauge 
field becomes massive, owing to the Anderson-Higgs mechanism, leaving the Z, 
part of the gauge field unaffected. The resulting state from the spinon pairing ofa 
spinon-Fermi-surface U(1) QSL is a Z, QSL. Such a spinon pairing scenario was 
proposed to account for the very low-temperature behaviour of the organic spin 
liquid «-(BEDT-TTF)2Cup(CN); (ref. 38). However, for Yo»MgGaOu, we do not 
find any evidence of spinon pairings in either thermodynamic or spectroscopic 
measurements. Although the INS measurement might be constrained by the 
energy resolution, the thermodynamic measurement did not find any suppression 
of the density of states down to the lowest temperatures'*””, If the spinon pairing 
instability occurs for YboMgGaO,, it must be at a much lower temperature or energy 
scale than those of current and previous experiments'*. In any case, the presence 
of a spinon Fermi surface is the precondition for any spinon pairing instability. 

We now discuss the finite-temperature thermal effect of the QSL. For the 
spinon-Fermi-surface U(1) QSL in two dimensions, there is no line-like object 
in the excitation spectrum. Therefore, as the temperature is increased from 
this QSL ground state, there is no thermal phase transition caused by prolifer- 
ating any extended line-like excitations. Moreover, the spinon-Fermi-surface 
U(1) QSL is not characterized by any symmetry. Consequently, there is no 
symmetry-breaking transition as the temperature is increased. The absence 
of the thermal phase transition is consistent with what has been observed in 
YbMgGa0O,. As the temperature is increased from the T= 0K ground state 
of the QSL, the system involves more thermal superposition of excited states 
and gradually loses its quantum coherence. A temperature of 20K is approxi- 
mately the energy scale of the spin excitation bandwidth, which sets the inter- 
action energy scale between the Yb local moments. At this temperature, the 
correlation between the local moments cannot be ignored. Its consequence is 
the diffusive feature in the INS spectrum. This is consistent with our data measured 


at 20 K shown in Extended Data Fig. 4a, b, in which the spectral weight becomes 
more diffusive. 

Finally, we comment on the weak spectral peak at the M points at low energies 

(Figs 2a, 3a). This non-generic feature of the neutron spectrum is not obtained in the 
theoretical calculation using the minimal spinon mean-field model in equation (1). 
This is because we did not include the effect of anisotropic spin interaction, 
which would break the spin rotational symmetry of equation (1). In the strong 
anisotropic limit, the generic spin interaction for the Yb local moments 
favours stripe-like magnetic order, with the wavevectors at the M points”’. 
In a recent calculation, it was shown that the anisotropic spin interaction 
enhances the spin correlation at the M points”!. Despite the presence of a 
weak peak at M, the vast majority of the spectral weight is still dominated by 
a broad continuum across the Brillouin zone at the lowest energies measured 
(Figs 2a, 3a). 
Dynamic spin structure factor of a Dirac QSL. As a comparison with the 
spinon-Fermi-surface QSL, we carry out the same calculation for the spinon mean- 
field Hamiltonian with a background 7 flux through each unit cell. This choice 
of the background flux gives a Dirac U(1) QSL. We fix the gauge according to the 
hopping parameters that are specified in Extended Data Fig. 6a. The spinon band 
structure of this mean-field Hamiltonian is 


we = + J/2t,[3 + cos(2ky) + 2sin(ky)sin(/3 ky) 


where we have set the lattice constant to unity. We observe two Dirac nodes at 
k=(+1/2, =1/(2\/3))(Extended Data Fig. 6b); the spinon Fermi energy is right 
at the Dirac nodes. 

At low energies, the only spin-1 excited states involve either an intra-Dirac-cone 
spinon particle-hole pair or an inter-Dirac-cone particle-hole pair. Therefore, the 
spectral intensity of the dynamic spin structure factor should be concentrated at 
the momentum transfer that corresponds to the intra-Dirac-cone and the inter- 
Dirac-cone processes. As shown Extended Data Fig. 6c, d, the dynamic spin 


structure factor at low energies is peaked at the I’ point, the M=(0,27././3) point 


and the symmetry-equivalent momentum points. This result differs from the broad 
continuum that is observed in the experiment. Therefore, the 1-flux state is 
inconsistent with the experimental data, as are other Dirac spin liquids. 

Data availability. The data that support the findings of this study are available 
from the corresponding author on reasonable request. 
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Extended Data Figure 1 | Photographs, XRD patterns and field 
dependence of the magnetization of YoMgGaQ,. a, Photographs 

of a representative YbMgGaO, single crystal. b, XRD pattern of a 
YbMgGaOyj single crystal from the cleaved surface. c, Rocking curve 

of the (0, 0, 18) peak. The horizontal bar indicates the instrumental 
resolution. d, Laue pattern of the YoMgGa0O,j single crystal viewed from 


the c axis. e, Observed (red) and calculated (green) XRD diffraction 
intensities of ground single crystals. The X-ray has a wavelength of 


H(T) 


1.54A. The blue curve indicates the difference between the observed and 
calculated intensities. f, Magnetic field dependence of magnetization at 
T=2K. Fitted g factors and Van Vleck susceptibilities yyy are shown 
(1g is the Bohr magneton). The dashed lines are linear fits above 12 T. 
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Extended Data Figure 2 | Elastic neutron scattering measurements . 
Elastic neutron scattering map in the (HKO) plane at 30 mK. No magnetic 
Bragg peaks are observed. The ring-like pattern is due to scattering from 
the polycrystalline Cu and Al sample holder. Because of the very large 


c-axis lattice constant and a small tilt of the scattering plane, some of the 
tails of the nuclear Bragg peaks for L= +1 can be also seen. Dashed lines 
indicate the Brillouin zone boundaries. 
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Extended Data Figure 3 | Correction of neutron beam self-attenuation. a, Elastic incoherent scattering image at 20 K. b-f, Raw constant-energy 
images at 70 mK and at the indicated energies. The scattering intensities in d, e and f have been multiplied by 2, 4 and 8, respectively, for clarity. Dashed 
lines indicate the Brillouin zone boundaries. 
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Extended Data Figure 4 | Additional neutron scattering data at 20K. a, b, Constant-energy images at 0.3 meV (a) and 0.6 meV (b) at 20K. ¢, Intensity 


contour plot of the spin excitation spectrum along the high-symmetry momentum directions at 20 K. The scattering is broadened and weakened 
compared with that at 70 mK. 
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Extended Data Figure 5 | Calculation of the zero-flux Hamiltonian. here, which is obtained using H = k,/ (4m) — J/3 ky /(4m) and 
a, Spinon dispersion w, of the zero-flux Hamiltonian. The grey plane K=k,/(4m) + J3 ky / (47). ¢, Measured spin excitation spectrum along 
marks the Fermi level at w = 0; its intersection with the band gives the high-symmetry directions at 70 mK. d, Calculated energy dispersion at the 
Fermi surface. The light orange hexagon represents the projection of the indicated momenta (marked by arrows in b). e, Measured constant-Q 
first Brillouin zone. The maximum of w, is 3t and the minimum is —6t, scans at the indicated momenta. The dashed line is the incoherent elastic 
providing a bandwidth of 9t. b, Calculated dynamic spin structure factor line for Er=4 meV. 


along high-symmetry directions. A reciprocal lattice unit (r.l.u.) is used 
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Extended Data Figure 6 | Calculation of the 7-flux Hamiltonian. a, Flux 
pattern and real nearest-neighbour hoppings on the triangular lattice. In 
the figure, ‘+f denotes tj = t= t and ‘“—f denotes ti = tj = —t; “n’ denotes 


triangles that are threaded by a 7 flux. b, Spinon band structure of the 


x-flux Hamiltonian. The two bands are particle-hole related, both with 
bandwidths of 3t. c, Calculated momentum dependence of the dynamic 
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spin structure factor at low energy w= 2.1t. Strong peaks can be 
distinguished at the F point, the M = (0,27/./3) point ((1/2, —1/2) in 
r.l.u.) and equivalent positions. White dashed lines denote the zone 
boundaries. d, Calculated dynamic spin structure factor along high- 
symmetry points with 7 = 0.3t. 
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Extended Data Table 1 | Refined structural parameters for YbMgGaO, at room temperature 
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Yb 


Mg 


Ga 


01 


O02 


a (A) 
c (A) 


B,, (A’) 
B,, (A?) 


Z 
B,, (A?) 
B,, (A?) 


Z 
B,, (A?) 
B.. (A?) 


Z 
B,, (A?) 
B,, (A?) 


Z 
B,, (A%) 
Be. (A?) 


Rp 
wRp 
x 


3.40125 (1) 
25.10632 (16) 


0.1332 (18) 
0.00204 (3) 


0.21378 (6) 
0.131 (4) 
0.00161 (6) 


0.21378 (6) 
1.131 (4) 
0.00161 (6) 


0.28887 (19) 
0.107 (9) 
0.00226 (17) 


0.12884 (17) 
0.137 (9) 
0.00089 (18) 


1.18 
1.81 
2.25 


Space group: R3m (number 166). Atomic positions: Yb, 3a (0, 0, 0); Mg, 6c (0, 0, z); Ga, 6c (0, 0, z); O, 6c (0, 0, z). Bi, Debye-Waller factor; Rp, profile factor; wRp, weighted profile factor. 
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Self-assembly of tetravalent Goldberg polyhedra 
from 144 small components 


Daishi Fujital**, Yoshihiro Ueda!?, Sota Sato*+°, Nobuhiro Mizuno®, Takashi Kumasaka® & Makoto Fujita? 


Rational control of the self-assembly of large structures is one 
of the key challenges in chemistry'~’, and is believed to become 
increasingly difficult and ultimately impossible as the number of 
components involved increases. So far, it has not been possible 
to design a self-assembled discrete molecule made up of more 
than 100 components. Such molecules—for example, spherical 
virus capsids'°—are prevalent in nature, which suggests that the 
difficulty in designing these very large self-assembled molecules is 
due to a lack of understanding of the underlying design principles. 
For example, the targeted assembly of a series of large spherical 
structures containing up to 30 palladium ions coordinated by 
up to 60 bent organic ligands!!~!° was achieved by considering 
their topologies!’. Here we report the self-assembly of a spherical 
structure that also contains 30 palladium ions and 60 bent ligands, 
but belongs to a shape family that has not previously been observed 
experimentally!”. The new structure consists of a combination of 
8 triangles and 24 squares, and has the symmetry of a tetravalent 
Goldberg polyhedron'*!”. Platonic and Archimedean solids have 
previously been prepared through self-assembly, as have trivalent 
Goldberg polyhedra, which occur naturally in the form of virus 
capsids” and fullerenes”!. But tetravalent Goldberg polyhedra 
have not previously been reported at the molecular level, although 
their topologies have been predicted using graph theory. We use 
graph theory to predict the self-assembly of even larger tetravalent 
Goldberg polyhedra, which should be more stable, enabling another 
member of this polyhedron family to be assembled from 144 
components: 48 palladium ions and 96 bent ligands. 

It has been shown" that the structures of giant M,,L2,, assemblies 
(consisting of n ions and 2n ligands) are geometrically restricted, in 
that the assemblies need to be roughly spherical, isotropic structures to 
minimize their surface energy. If we assume the formation of roughly 
spherical, regular or semi-regular polyhedra, then the possible values 
of n for M,,L2,, assemblies are limited to 6, 12, 24, 30 and 60 (refs 4, 11). 
On the basis of this geometric restriction, we have been able to target 
these structures specifically, avoiding the many possible assemblies that 
might form for other n. We have previously reported the self-assembly 
of the MgL;2 octahedron”, the Mj L24 cuboctahedron", the My4L4g 
rhombicuboctahedron!? and the M39L¢o icosidodecahedron'® (Fig. 1a). 
While dedicating effort towards the self-assembly of the next target— 
the MgoL129 rhombicosidodecahedron—we unexpectedly encountered 
an ‘undefined’ polyhedron with M39L¢9 composition and the topology 
depicted in Fig. 2. This metal complex is a seemingly isotropic poly- 
hedron consisting of 8 triangles and 24 squares, clearly different from 
the isostructural M39L¢0 icosidodecahedron and with a high-symmetry 
topology that does not belong to that of the Archimedean solids. This 
polyhedron does not have a mirror plane and, unlike the Archimedean 
solids, features molecular chirality defined by its topology (Fig. 2d). 

We observed this M39L¢o framework when the experimental procedures 
were carried out under conditions typical for targeting self-assembly 


of standard M,,L2, complexes", except that selenophene-cored 
bipyridyl-type ligand 1 was used instead of an organic bipyridyl ligand 
(Fig. 1b). The bend angle (@) of ligand 1 is 152°, only 3° larger than that 
of thiophene-cored ligand 2 (= 149°), which selectively assembles into 
the Mz4L.4s rhombicuboctahedron upon palladium(1) coordination». 
A small difference in the bend angles of two ligands can critically switch 
the resultant self-assembled structure, as has been observed for the 
My2L24-to-Mo4L4g transition at around 0 = 131°-134? (ref. 22). 

In addition to the observation of a simple 'H nuclear magnetic 
resonance (1H NMR) spectrum (Methods and Extended Data Fig. 1b), 
X-ray crystallographic analysis showed the three-dimensional 
coordination geometry of an M3oL¢9 complex (3) with initially undefined 
topology (Fig. 2). As for the other large M,,L2, polyhedra, the diffraction 
data collected at the synchrotron X-ray facility resembled data obtained 
for protein crystals rather than for conventional metal complexes. 
Therefore, the model building was carried out using ligand/solvent 
model alignment on the 2F, — F- map and on the electron density 
map obtained using the maximum entropy method (MEM)”*4, in a 
manner analogous to that used for protein crystallography (Fig. 2b). 
For the M,,L2,, polyhedra, the data resolution is sufficiently high to 
be able to study the overall topology of the structure on the basis of 
the cloudless electron density map’®. The refined three-dimensional 
structure of 3 (Fig. 2c) consists of 30 palladium (1) ions and 60 organic 


a M,L,, Platonic or Archimedean solids 


b = R=R’=OCH, R, R’ = -OCH,CH,O- 
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Figure 1 | M,,L2,-type polyhedral metal-organic ligand complexes. 

a, Schematic representation of M,,L2,, complexes with the symmetry of 
Platonic or Archimedean solids. Each vertex represents a metal ion centre 
and each edge represents an organic ligand. Under the prerequisite that 
each vertex connects to four edges (because the metal ion used in this 
system (palladium(11)) has a square planar coordination geometry), only 
five structures are allowed. M,,L2, complexes with n= 6, 12, 24 and 30 
have previously been synthesized. b, Molecular structures of the organic 
ligands. The bend angle of the ligand 0 dictates the final self-assembled 
product. A very small change in 0 can result in very different final 
products. 


@= 149° 
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Mirror plane 


Figure 2 | X-ray crystallographic analysis of self-assembled product 3. 
a, MEM (maximum entropy method) electron density map (0.6 electrons 
per A+) of MzoL¢9 complex 3. Supplementary Video 1 shows a rotating 
electron density map. b, Enlarged view around a ligand in M3oL¢o with its 
MEM electron density map. The modelled ligand framework, metal ions 
and counter ions (BF4_) agree well with the observed electron density. 

c, Entire view of the X-ray crystallographic structure of M3oL¢o. Methoxy 
substituents and counter ions are omitted for clarity. Space group, Pbca; 
lattice parameters, a = 73.0 A, b=73.2 A and c= 143.4 A; data resolution, 
1.95 A; RU> 20(D)), 0.2049; Reree(I > 20(D)), 0.2391. d, Simplified image of 
the obtained structure. Each vertex represents a metal ion centre and each 
edge represents an organic ligand. The structure has polyhedral chirality; a 
pair of left-and right-hand forms is shown. 


bidentate ligands, and forms a roughly spherical polyhedron that is 
described by a combination of 8 triangles and 24 squares. 

The symmetry and topology of polyhedron 3 can be understood 
by extending the description of Goldberg polyhedra'’. Goldberg 
polyhedra, as first described in 1937, are convex polyhedra made up 
of hexagons and pentagons (Fig. 3a), and provide a theoretical foun- 
dation with which to describe the topology of fullerenes”! and some 
virus capsids”’. In this family of Goldberg polyhedra, 12 pentagons 
are evenly distributed within a hexagonal honeycomb sheet that fully 
circumscribes a sphere. Each polyhedron is defined by the relative 
position of the closest pentagons using two indices, h and k, and the 
notation G(h, k). Triangulation (T) numbers are defined by the equa- 
tion T=h? + hk +k, where hand k are non-negative integers and so 
T=1,3,4,7, 9, 12, 13, 16, ... (Fig. 3b); the discussion of T numbers in 
the context of virus capsids”® originates from this equation. 

Plane-graph theory has been used to extend the original description 
of Goldberg polyhedra to create new families with more complicated 
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topologies and with more systematic notation; all possible topologies 
are tabulated in refs 26-28. We denote classical and extended Goldberg 
polyhedra by tri-G(h, k) and tet-G(h, k), respectively, where the prefixes 
‘tri’ and ‘tet’ indicate trigonal and tetragonal nodes of the polyhedra. 
As indicated by plane-graph theory, the surfaces of tet-G(h, k) poly- 
hedra consist of eight triangles evenly distributed within a square grid 
network; the two indices h and k represent the relative positions of 
the closest triangles. The quadrangulation (Q) number is defined by 
the equation Q=h? + k’, which describes the Pythagorean theorem 
(Q=1, 2,4, 5, 8, 9, 10, 13, 16, ...) (Fig. 3c). For Q= 1, the notation tet- 
G(1, 0) (which is equivalent to tet-G(0, 1)) indicates that the closest 
triangles are one step away from each other, thus giving rise to an 
octahedron (Fig. 3d). For Q=2, tet-G(1, 1) is specified, with the 
closest triangles located one step horizontally and one step vertically 
away from each other, resulting in a cuboctahedron (Fig. 3e). Q=4 
represents tet-G(2, 0) (equivalent to tet-G(0, 2)), in which the closest 
triangles are two steps away from each other; this results in a rhom- 
bicuboctahedron (Fig. 3f). 

Within this framework, our initially undefined complex 3 is a 
tet-G(2, 1) polyhedron, with Q=5. Both a schematic (Fig. 3g) and the 
crystallographic structure (Fig. 2a—c) reveal the closest triangles to be 
placed two steps horizontally and one step vertically away from each 
other. Q=5 is the first Q number with an ‘asymmetric (h, k) combi- 
nation: for Q=1, 2 and 4, the h and k indices are ‘symmetric’ in the 
sense that they can be exchanged without altering the corresponding 
structure (that is, tet-G(h, k) = tet-G(k, h)); however, for Q=5, 
tet-G(1, 2) and tet-G(2, 1) give two mirror-image graphs and thus 
generate an enantiomeric pair of chiral frameworks (Fig. 3g). The 
graph presentation in Fig. 3d—g illustrates whether a given tet-G(h, k) 
polyhedron is chiral or not (those for which h # 0,k # Oandh # kare 
chiral; all others are not). Using proven polyhedral formulae, primarily 
Euler’s polyhedron theorem, we determined the structures of the 
tet-G(h, k) polyhedra for Q numbers up to 10 (Fig. 3h). 

The underlying theory of the topology of M,,L2, complexes (based on 
graph theory of Goldberg polyhedra) predicts that tet-G(h, k) polyhedra 
with Q>8 should exist. We deduced the structures of MygLo¢ 
(tet-G(2, 2), Q = 8), Ms4L jog (tet-G(3, 0), Q= 9) and Meol120 
(tet-G(3, 1), Q= 10) (Extended Data Fig. 2), and decided to target the 
predicted (but as yet unobserved) Mygl.o¢ complex (4)—the tet-G(2, 2) 
polyhedron consisting of 8 triangles and 42 squares (Fig. 4a, inset). 

We expected that M3oL¢9 (3) would be a kinetically trapped meta- 
stable structure because modelling shows that the bend angle of 
152° in ligand 1 should favour the less distorted MagLo¢. This result 
suggests that M39L¢69 could be converted to MagLo¢ under suitable 
conditions, prompting us to examine a range of self-assembly con- 
ditions (by changing the concentration, reaction time, temperature 
and so on) and the use of modified ligands. NMR spectra for the 
huge molecules in the M4gLo¢ class of compounds are broadened, 
and so are not helpful for structural characterization. We were also 
not able to observe MygLo¢ or larger species using mass spectrometry, 
leaving X-ray crystallography as the only method capable of provid- 
ing direct evidence for the assembly of MagLo¢ or larger structures. 
This X-ray crystallography involved carefully observing, under a 
microscope, the morphology of single crystals that formed directly 
upon extremely slow (over 2-3 months) vapour diffusion of isopropyl 
acetate into the dimethylformamide (DMF) reaction solutions. 
Diffraction data collected in preliminary screening experiments at 
a synchrotron facility indicated cell parameters close to those of 3 
in most cases, but a very small fraction of the crystals taken from 
reaction solutions using forcing conditions (70°C for 48h) showed 
a different diffraction pattern. 

We collected data for more than 10 of the crystals with cell parameters 
different from those of 3. Despite numerous attempts, a resolution 
of better than 2.85 A could not be obtained; the inherent properties 
of the compounds in the class we study—large dimensions and huge 
void spaces filled with solvents—make the compounds protein-like, 
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observed 


Figure 3 | Schematic representation of Goldberg polyhedra. a, Goldberg 
polyhedra consist of pentagons and hexagons. Each polyhedron is 

denoted by the locational relationship between the closest pair of 
pentagons. In this case, the polyhedron is denoted tri-G(2, 4) (h=2,k=4). 

b, Definition of T number. The T number represents the square of the 
distance between the two closest pentagons on the hexagonal network. 
From the Pythagorean theorem, T= |h + k|* = [|h| + |k|cos(60°)]? + 
[|k|sin(60°)]* =h? + hk + k*. c, Definition of Q number. By simple analogy 
with the T number, the Q number is defined by the square of the distance 
between the two closest triangles on the square grid network. d-f, Diagrams 


so we cannot expect to achieve the higher resolution that is typical 
for standard metal complexes studied using conventional molecular 
crystallography. Several of the best diffraction datasets were subjected 
to structure refinement. 

The obtained electron density map clearly shows a very large spherical 
entity that consists of a framework made up of triangles and squares. 
Although the large molecular diameter and the disordered solvent 
molecules within the huge void space (70% of the unit cell volume) 
substantially attenuated the diffraction intensity, particularly in the 
high-angle region, strong peaks from heavy atoms (palladium atoms 
at the nodes and selenium atoms in the middle of the edges) made 
the structure refinement possible (Fig. 4b, c). The positions of the 
selenium and palladium atoms were clearly refined, and then ligand 
1, which bridges the palladium centres, was modelled with its density 
function theory (DFT)-optimized structure, as is common in protein 
crystallography (Fig. 4b). From the refined structure, we were able to 
confirm the overall geometry of the tet-G(2, 2) polyhedron (Q=8) and, 
hence, the successful self-assembly of the MagLog complex. Within the 


Theoretically ------> 


observed in this work predicted 


of tet-G(h, k) polyhedra. tet-G(1, 0) with Q= 1 is equivalent to an 
octahedron (d), tet-G(1, 1) with Q=2 is a cuboctahedron (e) and 

tet-G(2, 0) with Q=4 a rhombicuboctahedron (f). g, Diagram of tet-G(2, 1) 
and tet-G(1, 2) (Q=5), which have topology that is identical to that 

of the left- and right-hand forms of M3oL¢o, respectively, shown in 

Fig. 1a, b. The ‘asymmetric’ h and k values support the origin of chirality. 
h, Summary of extended Goldberg polyhedra ordered by Q number. Only 
those corresponding to Q= 1, 2 and 4 have been observed experimentally 
previously; in this work we observe those corresponding to Q = 5 and 8; 
and the others are predicted by theory, but have yet to be observed. 


large spherical framework, the closest triangles are in an (h, k) = (2, 2) 
relationship—two steps horizontally and two steps vertically away from 
each other (Fig. 4d). Our crystallographic data provide a clear visuali- 
zation of this tet-G(2, 2) polyhedral framework. 

Although it was a serendipitous experimental observation that 
led us to explore the theory of tetravalent Goldberg polyhedra, this 
theory then enabled us to consider and achieve the self-assembly of 
the giant MugLo¢ polyhedron, which contains the largest number of 
components observed in a self-assembled molecular structure so far. 
Tetravalent Goldberg polyhedra, which have square packing, appear 
rarely in nature; by contrast, hexagonal packing—as seen in graphite 
and honeycombs and in trivalent Goldberg polyhedra such as fullerenes 
and virus capsids—is frequently observed. This reason for this differ- 
ence in prevalence is presumably that square packing is not as close 
as hexagonal packing. The square planar coordination geometry of 
palladium(11) ions is a rare example of a square motif that can induce 
the unusual square packing and, here, that enabled the formation of 
tetravalent Goldberg polyhedra. Given that quasi-crystals were once 
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tet-G(2, 2) 


Figure 4 | X-ray crystallographic analysis of self-assembled product 4. 
a, Entire view of the MEM electron density map (0.6 electrons per A°) for 
MygLo¢ complex 4 (right). This complex has the topology of the tet-G(2, 2) 
polyhedron with Q=8 (left). Supplementary Video 2 shows a rotating 
electron density map. b, Enlarged view around a ligand in MygLog with its 
MEM electron density map. The modelled ligand structure agrees well 
with the observed electron density. Space group, 14; lattice parameters, 
a=b=63.7 A, c=94.6 A; data resolution, 2.85 A; R(I>2o(J)), 0.2181; 
Rfree(I > 20(1)), 0.2465. ¢, Sliced image of the MygLo5 complex with the 
MEM electron density map. No distinguishable peaks in electron density 
are observed in the void space; this supports the validity of the model 
building. d, Crystal structure of 4 emphasizing the tet-G(2, 2) topology 
and metal centres. 


considered obscure, but are now rationally designed”? and known to 
be ubiquitous in many areas*’, we expect that many other examples of 
tetravalent Goldberg polyhedral will be discovered or artificially manu- 
factured in the near future. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


General information. NMR spectra were measured on a Bruker DRX-500 
(500 MHz) spectrometer equipped with a BBO gradient probe and a Bruker 
AV-500 (500 MHz) spectrometer equipped with a TCI gradient CryoProbe. 
Infrared spectra were measured as KBr pellets using a DIGILAB FTS-7000 
instrument. Melting points were determined on a Yanaco MP-500V apparatus. 
Solvents and reagents were purchased from TCI, WAKO Pure Chemical Industries 
and Sigma-Aldrich. Unless otherwise noted, all of the chemicals were reagent grade 
and used without any further purification. 

Synthesis of ligand 1. 4-pyridylboronic acid pinacol ester (440 mg, 2.2 mmol), 
PdCl,(dppf) (50 mg, 0.072 mmol) and K3PO, (910 mg, 4.3 mmol) were added 
at room temperature to a solution of 2,5-dibromo-3,4-dimethoxyselenophene 
(270 mg, 0.72 mmol) in toluene/i-PrOH/H,O (5.0 ml, 3/1/1), and the resulting mix- 
ture was stirred at 85°C for 16h. The reaction mixture was filtered through a celite 
pad and washed with AcOEt. The filtrate was washed with water and brine, dried 
over anhydrous sodium sulfate and evaporated in vacuo. The residue was purified 
by column chromatography on silica gel (chloroform:methanol = 100:0 — 98:2) to 
give 1 as a brown solid. The resultant material was purified by gel permeation chro- 
matography (eluent: CHC];) to afford a pale yellow solid (130 mg, 53%). Melting 
point: 120-121 °C. ‘HNMR (500 MHz, CDCls, 27°C) 6 (p.p.m.):8.62 (d,J=6.0 Hz, 4H), 
7.57 (d, ]=6.0 Hz, 4H), 3.91 (s, 6H). 8C NMR (125 MHz, CDCh, 27°C) 6 (p.p.m.): 
150.7 (C), 150.4 (CH), 141.1 (C), 127.2 (C), 121.1 (CH), 60.3 (CH3). Infrared 
(cm™!): 3,206, 3,040, 2,846, 1,589, 1,540, 1,489, 1,415, 1,370, 1,321, 1,292, 1,224, 
1,214, 1,114, 1,099, 1,070, 1,030, 1,000, 990, 900, 864, 818. Mass spectroscopy (MS) 
(EI): calculated for m/z Cj6H,402.N2Se [M*] 346.0, found 346.0. HRMS (ESI TOF): 
calculated for m/z CsH,s50.N2Se [M + H]* 347.0294, found 347.0297. Elemental 
analysis: calculated for CjgH4O2N2Se: C, 55.66; H, 4.09; N, 8.11. Found: C, 55.52; 
H, 4.12; N, 8.07. See Extended Data Figs 3 and 4 for full range NMR spectra. 
Self-assembly of M3oL¢9 complex 3. A solution of [Pd(BF4)2](CH3CN)4 in DMSO-d. 
(5.2 |umol) was added to a solution of ligand 1 (3.14 mg, 10,1mol) in DMSO-d, 
(0.50 ml) and the resulting mixture was stirred at room temperature for 3h. 
The reaction can also be carried out in DMF-d7. 'H NMR (500 MHz, DMSO-d, 
300K) 6 (p.p.m.): 9.2 (br), 7.9 (br), 3.8 (br). ‘'H NMR (500 MHz, DMF-d;, 
300K) 6 (p.p.m.): 9.4 (br), 8.0 (br), 3.9 (br). 

The 'H nuclear magnetic resonance (1H NMR) spectrum showed considerable 
broadening and downfield shifts of the peaks assigned to the aromatic protons 
(Extended Data Fig. 1b). Downfield shifts of PyH, (Aé=0.57 p.p.m.) and PyHg 
(Aé=0.22 p.p.m.) suggested that the pyridines were coordinated to palladium(1) 
ions, and the broadened signals were consistent with the formation of a very large 
molecular structure with high symmetry. 'H diffusion-ordered NMR spectroscopy 
(‘H DOSY) also indicated the formation of a single product with a diffusion coef- 
ficient D of 2.6 x 10°! m? s“! (logD = — 10.58) at 300 K in DMSO-d, (Extended 
Data Fig. 1c). The observed value of D is smaller than that for the corresponding 
MogL.4g complex’, which suggests the formation ofa larger product. See Extended 
Data Figs 5-8 for full-range NMR spectra. Reliable evidence for the formation of 
3 was obtained by the crystallographic analysis described below. 

Self-assembly of MagLo¢ complex 4. Miglo¢ complex 4 was obtained as a side 
product of 3 when the complex was prepared under forcing conditions (70°C for 
48h). Presumably, complex 3 is a kinetically trapped product, which in part turned 
into thermodynamic product 4 at 70°C. The ratio 3:4 could not be determined 
because of the complete overlap of broadened signals of the two complexes in 'H 
NMR. They were not distinguished by DOSY. From this solution, single crystals of 
3 and 4 were directly obtained as a mixture by extremely slow (over 2-3 months) 
vapour diffusion of isopropyl acetate into the DMF reaction solutions. Under a 
microscope, we tried to distinguish the single crystals of 3 and 4 from the difference 
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in the crystal morphology. Although most of the crystals were diffracted with the 
cell parameters of 3, a very small portion of the crystals were diffracted in a dif- 
ferent pattern in preliminary screening experiments at a synchrotron facility, and 
were subsequently characterized as 4 by the crystallographic analysis. 
Crystallographic analysis of 3 and 4. Single crystals of 3 were prepared under 
extremely slow vapour diffusion (over 2-3 months) of isopropyl acetate into an 
N,N-DME solution of 3 (with tetrafluoroborate as the counter anion). Single crys- 
tals of 4 were obtained in a similar way. The final diffraction data were measured 
at the BL38B1 beamline (Rayonix MX225HE detector) and the BL41XU beamline 
(PILATUS 6M detector) at SPring-8. XDS*! was used for the processing and data 
reduction of 3 and 4. The structures were solved and refined by SHELXT*” and 
SHELXL**. These data were not of high enough resolution to allow calculation of 
the hydrogen positions. The crystal lattice contains huge voids (75.1% for 3 and 
70.2% for 4) with a large number of disordered solvents and anions. Owing to the 
very poor diffraction power of the crystal (resolution of 1.95 A for 3 and 2.85 A 
for 4), solvent molecules could not be located as realistic molecular structures 
even in the maximum entropy density map calculated by ENIGMA™. The ligands 
showed very large thermal motion (maybe as a result of positional disorder, which, 
however, could not be modelled) and so many restraints on the bond distances and 
thermal parameters (DFIX, DANG and FLAT) had to be applied to the ligands, 
counter anions and Pd-N distances to avoid chemically unreasonable bond dis- 
tances and angles. Pd atoms were refined anisotropically; all of the other atoms 
were refined isotropically. ENIGMA was used to calculate a reasonable electron 
density map with the MEM. For 3, the total number of electrons Ty ¥ 321,000 of 
the crystal is determined by summing the modelled electrons of M3oL¢o and BF4 
(14,280 x 8) and product of the volume of solvent (approximately 581,500 A) 
and the average electron density of DMSO (0.356 electrons per A*). The DMSO 
solvent molecules were determined from the MEM map. For 4, Tp + 142,000 is 
determined as the sum of the modelled electrons of M3oL¢9 and BF, (45,696) and 
the product of the volume of solvent (approximately 269,438 A®) and the average 
electron density of DMSO (0.356 electrons per A). The crystallographic data are 
summarized in Extended Data Table 1. 

Mathematical discussion. The surface of any convex polyhedron has Euler char- 
acteristic V — E + F=2, where V, E and F are the numbers of vertices, edges 
and faces, respectively, of the given polyhedron. In addition, the following equa- 
tion for tetravalent Goldberg polyhedra is derived from equation (53) in ref. 35: 
3F3 + 2Fy — 2V4= 12, where F; is the number of triangles, Fy is the number of 
squares and V4 is the number of 4-valent vertices. If we set the number of triangles 
to 8, then we get a very simple formula for the tet-G(h, k) structures discussed here: 
F— V=2 (for tetravalent Goldberg polyhedra, F= F3 + Fy and V= V4). Because 
V=6Q, the values of V, E and F can be determined for any tet-G(h, k) polyhedra. 
Data availability. All of the data generated and analysed during this study are 
included in this Letter, its Extended Data and its Supplementary Information. 
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Extended Data Figure 1 | NMR study of the self-assembly of ligand 1. The aromatic signals are shifted downfield and heavily broadened. 
a, 'H NMR spectrum (500 MHz, DMSO-dg, 300 K) of ligand 1. The signal c, 'H DOSY spectrum of 1 after self-assembly with palladium(11) ions 
denoted PyH, is derived from the protons in the pydridyl «-position; that (BF,~ salt). The spectrum indicates a single product with a diffusion 


denoted PyH, is derived from the protons in the pyridyl 8-position. coefficient D of 2.6 x 107!!m* s~! (logD = — 10.58). The grey band isa 
The signal denoted -OCH; is from the methoxy protons. b, 'H NMR guide to the eye. All of the NMR spectra (500 MHz) were measured for 
spectrum of 1 after self-assembly with palladium(1) ions (BF, ~ salt). DMSO-dg solutions at 300 K. 
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tet-G(0,3) tet-G(1,3) tet-G(2,3) 
Q=9 Q=10 Q=13 


Extended Data Figure 2 | Schematic representation of larger tet-G(h, k) polyhedra. Polyhedra with the topology of tet-G(3, 0) (or, equivalently of 
tet-G(0, 3); Q=9), tet-G(1, 3) (Q= 10) or tet-G(2, 3) (Q= 13). For Q=9 and Q= 10, the other structure in the chiral pair (tet-G(3, 1) and tet-G(3, 2), 
respectively) is a mirror image of the polyhedron shown. 
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Extended Data Figure 3 | 'H NMR of ligand 1. Full range spectrum: 500 MHz, CDCl, 27°C. 
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Extended Data Figure 4 | '*C NMR of ligand 1. Full range spectrum: 125 MHz, CDCl, 27°C. 
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Extended Data Figure 5 | 1H NMR of 3. Full range spectrum: 500 MHz, DMSO-dg, 27°C. 
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Extended Data Figure 6 | lH NMR of 3. Full range spectrum: 500 MHz, DMF-ds, 27°C. 
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Extended Data Figure 7 | 1H DOSY NMR of 3. Full range spectrum: 500 MHz, DMSO-d,, 27°C. For comparison with Mj2L24 or My4L4g complexes, 
see ref. 36. 
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Extended Data Figure 8 | 1H DOSY NMR of 33. Full range spectrum: 500 MHz, DMF-ds, 27°C. For comparison with MjL24 or Mz4L4g complexes, 
see ref. 36. 
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Extended Data Table 1 | Crystal data and structural refinement for M3oL¢o (3) and MagLog¢ (4) 


Measurement 
X-ray Source 
Detector 
Temperature 
Wavelength 
Crystal system 
Crystal size 


Data processing 
Space group 
Unit cell dimensions 


Volume 

Resolution 

Absorption correction 
Reflections collected 
Independent reflections 
Completeness 

Ruerge 

I/o 

Redundancy 


Refinement 
Empirical formula 


Data / restraints / parameters 
Goodness-of-fit on F2 

R (1>20(1)) 

Rfree (I>20(1)) 

R (all data) 

Rfree (all data) 

wR2 (all data) 

Largest diff. peak and hole 


MEM Refinement 
Reflections calculated 
Estimated Total Charge 


R 
wR 


M30L¢60 (3) 


SPring-8 BL38B1 
Rayonix MX225HE 
100K 

0.97900 A 

Centric 

100 x 100 x 80 mm* 


Phca 
a=73.0(1)A 
b=73.2(1)A 
c=143.4(Q2)A 
774959.56 A® 
100.0 — 1.95 (2.06 — 1.95) A 
Empirical 
403060 

54165 

99.6 (97.4) % 
0.160 (0.887) 
8.52 (2.35) 
7.44 (7.36) 


C7680H6720N960O0960 
Se4goPd249BagoF 1920 
48312 / 4440 / 6813 
3.309 

0.2048 

0.2404 

0.2521 

0.2835 

0.5012 

0.42 and -0.45 e/A* 


41586 
321000 


0.050 
0.035 


MagLog (4) 


SPring-8 BL41XU 
PILATUS 6M 
296(2) K 

0.70000 A 

Centric 

150 x 150 x 50 mm? 


14 
a=b=63.7(3)A 
c= 94.6 (3)A 


383816.78 A® 

100.0 — 2.85 (3.02 — 2.85) A 
Empirical 

30333 

8637 

99.6 (98.8) % 

0.108 (0.682) 

7.87 (1.64) 

3.51 (3.19) 


C3072H 2688N3g4 Osa 
Se192Pdo6B 192F 768 
8234 / 1818 / 2543 
3.725 

0.2198 

0.2658 

0.2926 

0.3267 

0.5405 

0.38 and -0.37 e/A* 


3680 
142000 


0.054 
0.029 
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Water balance creates a threshold in soil pH at the 


global scale 


E.W Slessarev!, Y. Lin’, N. L. Bingham, J. E. Johnson‘t, Y. Dai®, J. P. Schimel! & O. A. Chadwick? 


Soil pH regulates the capacity of soils to store and supply nutrients, 
and thus contributes substantially to controlling productivity in 
terrestrial ecosystems!. However, soil pH is not an independent 
regulator of soil fertility—rather, it is ultimately controlled by 
environmental forcing. In particular, small changes in water 
balance cause a steep transition from alkaline to acid soils across 
natural climate gradients”. Although the processes governing this 
threshold in soil pH are well understood, the threshold has not 
been quantified at the global scale, where the influence of climate 
may be confounded by the effects of topography and mineralogy. 
Here we evaluate the global relationship between water balance 
and soil pH by extracting a spatially random sample (n= 20,000) 
from an extensive compilation of 60,291 soil pH measurements. 
We show that there is an abrupt transition from alkaline to acid 
soil pH that occurs at the point where mean annual precipitation 
begins to exceed mean annual potential evapotranspiration. We 
evaluate deviations from this global pattern, showing that they may 
result from seasonality, climate history, erosion and mineralogy. 
These results demonstrate that climate creates a nonlinear pattern 
in soil solution chemistry at the global scale; they also reveal 
conditions under which soils maintain pH out of equilibrium with 
modern climate. 

Climate controls many aspects of soil chemistry, affecting soil pH 
(ref. 4). Alkaline soils are known to be common in arid climates, while 
acid soils are known to be common in humid climates’. Surprisingly, 
however, the global-scale mechanisms governing this pattern remain 
broadly defined, and untested by direct observation. What are the 
dominant chemical equilibria that constrain soil pH? What aspect of 
climate defines the transition between alkaline and acid soils, and is the 
transition linear? The answers to these questions are fundamental to 
understanding soil development and surface geochemistry at the global 
scale. Furthermore, achieving this understanding may prove essential 
for representing soils in models of the terrestrial biosphere, given that 
soil pH controls many aspects of soil fertility>*. Here we illustrate that 
simple geochemical and hydrological concepts can be used to build a 
mechanistic understanding of soil pH at the global scale. 

Interpretations of acid-titration experiments indicate that the soil pH 
is typically most strongly buffered by equilibrium with two secondary 
minerals: calcite (CaCOs), or gibbsite (Al(OH)3)”°. CaCO; precipitates 
from calcium ions (Ca?*) and carbonate ions (CO37-) derived from 
dissolved carbon dioxide (CO2). Al(OH); precipitates from alumin- 
ium ions (Al?) that are released from negatively charged exchange 
sites’. Both CaCO; and Al(OH); consume protons (H*) when they 
dissolve and release H* when they precipitate, buffering soil pH (ref. 5). 
Under typical laboratory conditions, soils in equilibrium with CaCO3 
and atmospheric CO, have a pH of 8.2 (see Methods), while the pH of 
soils that contain exchangeable Al** is on average 5.1 (see Methods). 
The presence of CaCO; and Al(OH); is reflected in soil pH across a 
wide range of CaCO; and exchangeable Al** concentrations (Extended 
Data Fig. 1). 


Local studies of climate gradients have shown that the relative 
importance of these two buffers is determined by leaching, which 
removes Ca”* from the soil”*-*”. In climates where evaporative demand 
exceeds precipitation, leaching rates are low, and dissolved Ca?* accu- 
mulates as CaCO3—buffering soil pH near 8.2 (ref. 4). Conversely, in 
climates where precipitation exceeds evaporative demand, water 
leaches through the soil, removing Ca** and allowing accumulation of 
relatively immobile Al?*—buffering soil pH near 5.1 (ref. 4). Because 
runoff and leaching rates increase abruptly as precipitation exceeds 
evaporative demand”", the transition between CaCO; and Al(OH); 
buffered conditions is expected to occur over a small range of climatic 
forcing, creating a steep threshold in soil pH at the transition point 
between arid and humid climates**. 

However, whereas leaching controls the loss rate of Ca’*, topography 
and mineralogy control the supply rate of Ca”* to the soil solution via 
erosion and weathering", and thus interact with climate to influence 
soil pH over long timescales®. For instance, calcium-containing min- 
erals may be terminally depleted in old, low-relief landscapes that have 
been leached in the past, limiting Ca”* supply to the soil solution and 
creating Al(OH)3-buffered soils under arid conditions’. Alternatively, 
soils with short residence times in steep landscapes or areas domi- 
nated by Ca-rich rock can be rapidly supplied with Ca?* from weath- 
ering, counteracting the accumulation of exchangeable Al>* (ref. 12). 
Variation in the Ca** supply rate is challenging to constrain at global 
scales, and might obscure the fundamental relationship between cli- 
mate and soil pH. Thus, it is unclear whether the threshold in soil pH 
predicted by theory prevails globally. 

Nonetheless, we can search for the pH threshold at the global scale, 
given sufficiently extensive sampling. Statistically derived soil maps 
provide a tempting tool for validation!*!*. However, these maps rely 
on spatial projections of soil taxonomy that are sometimes explicitly 
defined by climate’, and would provide circular evidence. Thus, to test 
our hypotheses, we used actual measurements sampled from public 
databases of soil profiles (Extended Data Table 1). We then focused 
on pH in the subsoil (assigned here as soil to a depth of 0.5m), to 
avoid effects of land-use and vegetation that might obscure the under- 
lying geochemical signal. To overcome spatial biases in the databases 
(for example, heavy sampling in the USA), we developed a simple 
re-sampling approach that selects soil profiles randomly with respect to 
geographic space (Extended Data Fig. 2). We then associated these pH 
measurements with 1° gridded estimates of mean annual precipitation 
(MAP)! and a model of potential evapotranspiration (PET)!’, which 
represents evaporative demand. This allowed us to separate water- 
limited climates where leaching rates are low (MAP minus PET <0) 
from energy-limited climates where leaching rates are high (MAP 
minus PET > 0). 

Globally, the relationship between soil pH and MAP minus PET 
conforms closely to predictions. Soil pH at 0.5m depth has two modes 
that approximate 8.2 and 5.1, the values associated with CaCO3 and 
Al(OH); buffers (Fig. 1). Where MAP minus PET approaches 0, there 
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Figure 1 | Soil pH at 0.5 m depth versus annual water balance. 
Transparent points show a spatial sample of 20,000 measurements of soil 
pH at 0.5 m depth. Side panels show histograms of MAP minus PET and 


soil pH, and yellow lines show the predicted pH values of CaCO3-buffered 
soils (8.2) and Al(OH)3-buffered soils (5.1). 


is a steep threshold between the two modes (Fig. 1). If we predict that 
leaching drives an immediate transition between CaCO; and Al(OH)3 
equilibria where MAP minus PET = 0, the predictions explain 42% 
(interquartile range 42%-45%) of the observed variation in pH (here 
‘variation’ means the median absolute difference from the median; see 
Methods). The threshold pattern is robust, appearing when MAP is bal- 
anced with an ensemble estimate of actual evapotranspiration rates!’, or 
with simpler models of PET driven by different environmental data sets 
(Extended Data Fig. 3). Soil pH is not as strongly bimodal in surface 
soil—which may be affected by organic matter, biological cycling of 
Ca**, and agricultural liming—but the fundamental nonlinear pattern 
is still present (Extended Data Fig. 4). Furthermore, where MAP minus 
PET =0, observed CaCO; concentrations diminish while exchangeable 
Al>* increases, supporting the hypothesis that leaching drives a steep 
transition from carbonate to aluminium buffering at the global scale 
(Extended Data Fig. 5). 


There are intriguing deviations from the central pattern—notably 
where acid soils occur in arid climates. These soils cluster into five 
regions: (1) the Sahel, (2) southern Africa, (3) northeastern Brazil, 
(4) Australia, and (5) mountains in western North America (Fig. 2). 
These acid soils may form where MAP minus PET is negative, but 
where appreciable leaching still occurs because of seasonal rainfall!” 
or snowmelt. Additionally, geologic constraints on Ca”* supply may 
explain acid soils in regions (1) to (4), which are low-relief continental 
surfaces where erosion is probably limited, and where conditions were 
more humid during the Last Glacial Maximum””*?, Although the role 
of palaeoclimate in creating these acid soils is challenging to evaluate 
quantitatively, they are generally most common in both seasonal and 
low-relief environments (Extended Data Fig. 6). The prevalence of acid 
soils in arid, low-relief landscapes is consistent with the idea that deple- 
tion of Ca-bearing minerals might irreversibly constrain pH over long 
timescales, even in dry climates’. 

Fewer deviations towards alkaline pH exist in humid climates. 
However, some humid soils have pH values exceeding 6.5. These 
measurements are scattered across several regions, including (6) south- 
ern China, (7) northern and central Europe, (8) northeastern North 
America, and (9) and (10) the Pacific Rim (Fig. 2). In regions (6) to 
(8), carbonate rocks are a major component of bedrock lithology”, 
and might prevent soil acidification by sustaining Ca”* supply to the 
soil solution. Comparison with a global lithologic map”° shows that 
soil profiles in the wettest quartile of MAP minus PET are 2.6 times 
more likely to have a pH value > 6.5 when they fall within 1° grid cells 
containing carbonate bedrock (Extended Data Fig. 7). In regions (9) 
and (10), active volcanoes may produce easily weathered silicate min- 
erals that could buffer pH outside the range of Al(OH); equilibrium’. 
More generally, we observe that humid-climate soils are less acidic in 
high-relief landscapes (Extended Data Fig. 7), where high soil produc- 
tion rates may increase the availability of fresh Ca-containing miner- 
als, increasing Ca** supply to the soil solution and thus counteracting 
accumulation of exchangeable Al?*. 

Intriguingly, the bimodal shape of the soil pH distribution indicates 
that soils in the neutral pH range (pH 6-7) are uncommon relative to 
soils in the CaCO3 and Al(OH); buffered ranges. Soils in this pH region 
are thought to be buffered by mineral weathering reactions”’. In theory, 
the capacity of these reactions to neutralize H* is limited by the relatively 
slow kinetics of primary mineral dissolution, and so neutral-range 
soils may evolve towards CaCO3 and Al(OH); equilibria over time’. 
Not coincidentally, neutral-range soils are intensively cultivated, 
because they cluster in sub-humid climates with sufficient rainfall 
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Figure 2 | Outliers from the global relationship between pH and MAP 
minus PET. Points show the centres of 1° grid cells containing soil profiles 
(n=4,488). Cells in the driest quartile of MAP minus PET (in which the 
majority of profiles have pH < 6.5) are plotted in red, whereas cells in the 
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wettest quartile of MAP minus PET (in which the majority of profiles 
have pH > 6.5) are plotted in blue, with the remaining cells shown in grey. 
Numbered regions are listed in the text. 
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for agriculture, but retain nutrients more effectively than acid soils”*. 
Thus, from an observational standpoint, the most naturally fertile 
soils are relatively uncommon—and hypothetically, their relatively low 
prevalence may result from intrinsic aspects of their chemistry. 

By assuming a threshold between CaCO3-buffered and Al(OH)3- 
buffered domains where MAP = PET, we can explain 42% (interquartile 
range 42%-45%) of the global variation in soil pH. The strength of this 
pattern indicates that a small number of specific chemical and physical 
mechanisms govern soil pH at the global scale. Moreover, by using this 
pattern as a guide, we can identify soils that appear out of equilibrium 
with modern climate. The distribution of these soils suggests a range 
of new questions that apply to the timescales of soil development: are 
acid soils in arid, low-relief environments irreversibly leached? Can 
erosion maintain high pH at a steady state in humid climates? And 
are neutral-range soils less common because they are poorly buffered? 
The answers to these questions are relevant at the timescale of human 
societies. Rapid changes in water balance caused by climate or land-use 
change might leave an increasing number of soils out of equilibrium 
with climate, with unknown consequences for their capacity to support 
productivity in natural and managed ecosystems. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Evaporation models. We use potential evapotranspiration (PET) to represent the 
evaporative component of climate rather than actual evapotranspiration (AET), 
because PET is independent of precipitation, and thus carries more information 
about arid climates. Specifically, we reason that climates that are close to the 
MAP = PET transition are more likely to have been leached in the recent past than 
climates where PET greatly exceeds MAP, even if both climates have comparably 
small modern values of MAP minus AET. In this sense, modern PET may provide 
a better index of long-term leaching rates than modern AET. 

We represented PET using two contrasting approaches. In the first approach, 

we represented PET using a modified Penman-Monteith-Leuning model, which 
estimates evaporation as a function of net radiation (R,), air temperature, vapour- 
pressure deficit, and the aerodynamic and surface conductance of vegetation!” 
This approach is biophysically detailed, but it requires many parameters. Thus, 
in the second approach, we represented PET using the comparatively simple 
Priestley-Taylor equation, which models evaporative demand as a function of net 
radiation, air temperature, and a scaling parameter, a (ref. 29). We also explored 
the relationship between soil pH and the difference of MAP and AET, which we 
represented using the mean of the diagnostic data included in the LandFlux- 
EVAL synthesis'*. By definition, MAP minus AET cannot take negative values, 
but approaches zero where PET > MAP!”. We report values of MAP minus AET 
without imposing this constraint, and so some modelled AET values exceed MAP, 
resulting in slightly negative values. 
Modified Penman-Monteith-Leuning model. The Penman-Monteith- 
Leuning model!’ partitions evaporation from the plant canopy (E,) and soil (E,). 
E, is estimated using the Penman-Monteith equation®, while evaporation 
from soil is assumed to equal the equilibrium rate, modified by a moisture 
constraint. Because we were interested in obtaining an estimate of PET, we did 
not include a soil moisture constraint on evaporation, and then assumed that 
PET was equal to the sum of canopy and soil evaporation. Evaporation from wet 
soil can be approximated by multiplying the equilibrium evaporation rate by the 
Priestley-Taylor coefficient, a (ref. 31). Thus, we substituted the Priestley-Taylor 
model for the equilibrium model to represent soil evaporation in the Penman- 
Monteith-Leuning formula. The combined evaporation from canopy and soil 
are given by the equation: 


AE tot = (sAc + pCpDaGa)/(s + v1 + Ga/Ge)) + a(sAs)/(s+ 7) (1) 


where A, and A, are the available energy absorbed by canopy and soil (R, minus 
soil heat flux, in units of MJm~? d~'), \ is the latent heat of vaporization of water 
(MJ kg~!), s is the slope of the saturation vapour pressure curve (kPa°C~!), 
pis the density of air (kg m~), cp is the specific heat of air at constant pressure 
(MJkg~!°C~), D, is the vapour pressure deficit (kPa), y is the psychrometric 
constant (kPa°C~'), G, is the aerodynamic conductance (m d~'), and G; is the 
canopy conductance (m d~!). Radiation is partitioned between canopy and soil 
by two equations!”: 


Ac = Atot (1 — eka") (2) 


As = Atot — Ac (3) 


where Ajor is equal to R, (the soil heat flux is assumed to be negligible), L is the 
leaf area index, and k, is an extinction coefficient. Canopy conductance (G,) is 
constrained by maximum stomatal conductance (g;x), and modified by factors that 
represent dependence on light availability and vapour-pressure deficit”: 


Ge = (S«/KQ)Inl(Qy + Qo) /(Qnexp(—kQL) + Qs5o)][1/(1 + Da/Dso)] (4) 


where Qy is photosynthetically active radiation at the top of the canopy (half of 
the incoming shortwave radiation), Qso is a half-saturation constant for Qh, Dso 
is a half-saturation constant for vapour-pressure deficit, and kg is the extinction 
coefficient for short-wave radiation. 

Most parameters were obtained from a regional implementation of the Penman- 
Monteith-Leuning model*’. The parameters k, and kg were both set equal to 
0.6m™!, while Qs, and Dso were set equal to 2.6 MJ m~? d~! and 0.8kPa (ref. 32). 
The maximum stomatal conductance, g,., was set equal to 0.006 m s-!, which is a 
reasonable mean estimate for natural vegetation*’, and scaled to a daily time step. 
The aerodynamic conductance, G,, is influenced by windspeed and vegetation 
height. Because reliable maps of both these parameters are unavailable at a global 
scale, we used biome-specific parameters™, assigning forests and savannas a value 
of 0.033 m s_!, shrublands a value of 0.0125m s!, and grasslands, cropland, and 
barren areas 0.01 m s~!. All other parameters were calculated or obtained from the 
Food and Agriculture Organization (FAO) guidelines™. 

Priestley-Taylor model. The Priestley-Taylor model for PET uses a single 
parameter, a, to account for adiabatic component of latent heat transfer’’. While 


a may vary as a function of meteorological conditions*, a standard a value of 
1.26 has been applied successfully at large scales**. Priestley-Taylor PET is given 
by the equation: 


Etot = asA/(s +) (5) 


where A is total available energy (equal to Rn) and a= 1.26. Other parameters 
are listed above. 

Precipitation dataset. We estimated mean annual precipitation (MAP) using a 
1° gridded map created from the Global Precipitation Climatology Center Full 
Data Reanalysis, Version 7.0'°. MAP was calculated as the mean annual sum of 
monthly precipitation values for the years 1961-2001. We use this 40-year interval 
because it includes a high spatial coverage of rain-gauge stations*®. We corrected 
for systematic rain gauge measurement error using static monthly under-catch 
corrections” provided by the Global Precipitation Climatology Center. 

Driving data for PET. Both Penman-—Monteith-Leuning and Priestley-Taylor 
models require monthly estimates of R, and air temperature, and the Penman- 
Monteith-Leuning model requires monthly estimates of vapour pressure, atmos- 
pheric pressure, surface short-wave radiation, leaf-area index, and land cover type 
(Extended Data Table 2). For both approaches, environmental variables obtained 
for multi-year time series were collapsed to monthly means of daily values before 
calculation of PET, PET was scaled from daily to monthly values, then summed to 
obtain annual PET. To test the sensitivity of our results to driving data, we used two 
radiation data sets: mean monthly values from the NASA/CERES energy-balanced 
and filled surface radiation budget, version 2.8, over the years 2001-201478, and 
mean monthly values from the NASA/GEWEX surface radiation budget version 
3.0 over the years 1984-2007*°. We obtained mean monthly values of air tem- 
perature and vapour pressure from the CRU TS3.13 data set, a gridded climatol- 
ogy at 0.5° resolution interpolated from weather station measurements, which 
we averaged over the period 1961-2001, the period of maximum weather station 
coverage*”. Atmospheric pressure was obtained using mean elevations from the 
ETOPO!1 global digital elevation model" in each 1° cell and correcting using the 
ideal gas law**. Land cover classes were obtained from the NASA MODIS satellite 
product MOD12” and monthly means of leaf area index for the period 2001-2012 
were obtained from the MODIS-derived Global Land Surface Satellite leaf area 
index data set**, averaged over the period 2001-2012. All data at a higher reso- 
lution than 1° were aggregated to mean values at 1° resolution before calculation 
of PET. 

Rainfall seasonality. We quantified rainfall seasonality by computing the coeffi- 
cient of variation of under-catch corrected monthly rainfall values from the Global 
Precipitation Climatology Center data set. 

Local relief. We estimated local topographic relief from the 1-arcminute resolution 
ETOPO!] digital elevation model*’. Local relief was calculated as the difference 
between maximum and minimum elevations within a 10-km radius of each 
1-arcminute cell centre. Local relief at 1° resolution was then calculated as the 
median relief within each 1° cell. 

Carbonate lithology. We represented the extent of carbonate lithology using the 
Global Lithologic Map (GLiM)”°. We determined which 1° grid cells contained 
carbonate rocks by subsetting the 0.5° raster version of GLiM for carbonate lithol- 
ogy, and then identifying all 1° cells that contained at least one 0.5° cell classified 
as carbonate rock. 

Soil profile data. We combined data from eight soil profile databases (Extended 
Data Table 1)*°~48. Profiles were included if they were non-duplicated and included 
measurements of pH in soil-water suspension. We used pH in water rather than 
pH in CaCl, or KCI solutions because pH in water is reported at a much higher 
frequency than pH in salt solutions. Data at 0.5 m and 0.1 m depth were obtained 
by selecting the horizon of each profile intersected by the corresponding depth. 
We selected absolute depths at 0.5m and 0.1 m rather than soil horizons because 
horizon nomenclature varied across data sets. Although the choice of depths is 
somewhat arbitrary, the depths were selected to span the depths at which bio- 
logical cycling typically influences cation concentrations”. Using the National 
Cooperative Soil Survey (NCSS) database as a reference, 0.5m approximates the 
median value for the top of the B horizon (0.52 m) and 0.1 m approximates the 
median value for the midpoint of the A horizon (0.09 m). The total number of 
profiles included was 60,291 at 0.5m depth and 67,900 at 0.1m depth (Extended 
Data Table 1). 

Dilution ratio correction. The soil-to-water ratio of the slurry used to measure 
soil pH varied across data sets. To account for the effects of the soil-to-water ratio, 
data reported for a 1:5 ratio were corrected to a 1:1 ratio using linear correction 
factors*’. We could not obtain correction factors for data measured at a ratio of 
1:2.5, and so left these data uncorrected. Including uncorrected data is unlikely to 
drive large errors in the global pH distribution because changing the soil-to-water 
ratio from 1:1 to 1:5 shifts pH by about 0.5 units*’, which is small relative to the 
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global range of soil pH values. Data measured in water without a reported ratio 
were assumed to be measured at ratios of 1:1 or 1:2.5. 

Statistical analyses. Soil profile data were spatially resampled. In this approach, 
individual soil profiles were selected based on proximity to randomly distributed 
sampling nodes (n = 20,000). Sampling nodes were drawn from grid-cell centres at 
1° resolution, with sampling weights based on cell area and allowing replacement. 
Nodes that were more than 100km froma soil profile were not sampled to mini- 
mize edge biases. Soil profiles were selected by identifying the closest grid cell to 
each node that contained profiles, and then randomly drawing a profile from the 
total set of profiles in the cell. By design, this approach includes individual profiles 
multiple times in the resampled data set, with the consequence that geographically 
isolated profiles are included more frequently than profiles in densely sampled 
areas. This approach has no statistical derivation, but it produces sampling distri- 
butions that appear less-biased than the underlying data (Extended Data Fig. 1). 
Water-balance model evaluation. To evaluate the relationship between MAP 
minus PET and soil pH, we compared observations to theoretical predictions based 
on calcite and gibbsite buffering systems. For all soils in grid cells where MAP 
minus PET <0, the predicted pH was 8.2, and for all remaining profiles, the pre- 
dicted pH was 5.1. Residuals from the model were then computed by subtracting 
predicted values from observed values. Because the data are bimodally distributed, 
residuals from this model have a heavy-tailed distribution, and measures of varia- 
tion based on squared errors (for example, the coefficient of determination, R’) are 
inappropriate*!. Instead, we estimated variation in the data using a robust measure 
of dispersion, the median absolute difference from the median (MAD). We then 
gauged model fit by comparing the MAD of the residuals to the MAD of the data: 
the percentage variation explained was equal to 1 minus MAD yesiduals/ MAD data- 
This metric is analogous to R?, but makes no assumption about the distribution of 
the data or residuals. We estimated the uncertainty in the percentage of variation 
explained by resampling the data with replacement 10,000 times® and calculating 
the interquartile range of the resulting distribution of parameter estimates. 
Logistic regression of outliers. We defined ‘outliers’ as soils with pH < 6.5 in 
strongly arid climates (driest quartile of MAP minus PET) and soils with pH > 6.5 
in strongly humid climates (wettest quartile of MAP minus PET). We deliberately 
reduced pH to this categorical expression to emphasize large-scale deviations 
between pH modes, rather than small-scale deviations around each mode. To 
quantify the prevalence of outliers as a function of rainfall seasonality, carbonate 
lithology, and topographic relief, we fitted logistic regressions*>**. Likelihood 
ratio tests were used to compare regressions against the null hypothesis that the 
proportion of outliers is uniform with respect to each predictor®. We ruled out 
possible collinearity between environmental predictors by checking individual 
correlations between predictors in both wet and dry climates. No two predictors 
had correlation coefficients above 0.25, and so we assume that the patterns pre- 
sented are independent. 

Calcite and aluminium chemistry. We used the NCSS database to validate chem- 
ical calculations and determine the relationship between climate, calcite (CaCO3), 
and exchangeable aluminium (Alx). We used the NCSS database for this purpose 
because it contains a large number of measurements of CaCO; and Alx using 
consistent methods*, and it reports the effective cation exchange capacity, which 
is required for modelling the pH of gibbsite buffered soils. We used a spatially 
resampled subset of 20,000 data points for plotting relationships with the annual 
water balance, following the resampling method above. 

Calcite buffer. The pH of a solution exposed to calcite (CaCO3) and open to the 
atmosphere can be solved using an equation derived from the chemical equilibria 
for CaCO (ref. 56): 


0 = H4(2K,/KiKoKu X Peo,) + H? — HKwKiKu X Peo, — KikoKu X Peo, (6) 


where H is the hydrogen ion activity in moles, K; is the solubility constant of CaCO3 
(in units of mol” 1~?), K, is the dissociation constant for water (in mol? 17), K, and 
Ky are the first and second dissociation constants of carbonic acid (in mol 1~!), Ky 
is Henry’s constant (in mol 17! atm~), and Pco, is the partial pressure of CO2 
(in atm). We solved this equation for Ht at 25°C anda Pcoo of 3.45 x 1074 atm 


using the package rootSolve”’ in R and published parameters**””. The p,g, value 
of 3.45 x 10 * atm reflects Pco, imposed by laboratory measurement conditions 
at standard atmospheric pressure, based on the ambient CO) mole fraction in 
1985°, the median measurement date of the data. Older measurements made at 
lower atmospheric CO; levels may reflect a slightly higher calcite equilibrium pH 
(that is, the expected pH is 8.3 before 1977). Because this difference is small and 
the majority of measurements were taken after this date, we report model fits for 
a single po, value. 

Calcite concentrations are approximate, and reported as CaCO; equivalents. 
The NCSS database reports CaCO; equivalents measured using a pressure calci- 
meter following acid dissolution, meaning that a range of carbonate minerals are 
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included in the estimate°>. Also, because values are reported at a precision of 1%, 
some soils with <1% CaCO; are probably reported with zero values, even if their 
pH reflects buffering by CaCO3. 

Gibbsite buffer. The pH ofa solution exposed to gibbsite (Al(OH)s) in a soil with 
exchangeable aluminium (Alx) depends on the ratio of Alx to other exchangeable 
cations (Cax). In nature, the solubility of Al(OH)3 and the exchange coefficients 
of clays do not follow the behaviour of purified laboratory solutions, and so the 
relationship between Alx, Cax and pH must be estimated empirically. To derive 
a typical pH for Al(OH)3-buffered soils, we took the mean of all measurements 
from the spatial sample of the NCSS database with non-zero Alx (pH=5.1). 
Additionally, to validate the theoretical relationship between Cax/Alx and pH, we 
fitted a model to measurements from the NCSS database taken at 0.5 m depth with 
non-zero Alx and effective cation exchange capacity. The Gapon exchange model 
can be used to develop a log-linear relationship between Cax/Alx and pH (ref. 61): 


pH = bo + bylog ,4(Cax/Alx) (7) 


where bo and b, are fitted constants. To fit the model, we assumed that Aly was 
equal to aluminium extractable in 1 M KCl, and Cax was equal to the effective 
cation exchange capacity minus Alx. The data show a log-linear relationship 
between Cay/Alx and pH (Extended Data Fig. 1, by = 4.96, b} =0.32, R’=0.36, 
P<0.01), supporting control of pH by Cax/Alx. However, we note that the rela- 
tionship appears slightly concave-curvilinear, suggesting that the Gapon model fails 
to account for the total activity of Alx. This issue warrants further investigation. 
Code availability. Code used to spatially resample soil profiles, calculate PET, and 
perform statistical analyses are maintained on GitHub and publicly archived online 
at http://dx.doi.org/10.5281/zenodo.61996. Code for pre-processing of raw data 
sets is available from the authors upon request. 

Data availability. All soil profile and meteorological data used in this study are 
publicly available from the sources listed in the text and in Extended Data Tables 1 
and 2. Several of the soil profile databases are only available by direct request from 
the providing institutions (see Extended Data Table 1). As such, the combined soil 
profile data set used in this study is available from the authors upon request, given 
permission from providing institutions. 
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Extended Data Figure 1 | Soil pH versus calcite and exchangeable atmospheric CO) (345 parts per million) at 25°C. b, The relationship 
aluminium. Transparent points show a spatial sample of 20,000 between soil pH at 0.5 m and the log-ratio of exchangeable calcium (Cax) 
measurements from the NCSS database. a, The relationship between soil to exchangeable aluminium (Alx), which is thought to control the pH of 
pH at 0.5m and CaCO; equivalents as a mass percentage. The yellow gibbsite-buffered soils. The yellow line is the fit by least-squares regression 


line shows the calculated pH ofa solution in equilibrium with calcite and (bp = 4.96, b; = 0.32, R?= 0.36, P< 0.01). 
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Extended Data Figure 2 | Results of spatial resampling. Transparent points show a spatial sample of 20,000 measurements (a and b) and a random 
sample of 20,000 measurements (c and d). a, c, pH at 0.5m depth versus MAP minus PET. b, d, The geographic distribution of measurements in the 
Americas. 
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pH versus MAP minus PET estimated using the Priestley-Taylor method 
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Extended Data Figure 4 | Soil pH at 0.1 m depth versus MAP minus 
PET. Transparent points show a spatial sample of 20,000 measurements of 
soil pH at 0.1 m depth. Side panels show histograms of MAP minus PET 
and soil pH, and yellow lines show predicted pH values of CaCO3-buffered 
soils (8.2) and Al(OH)3-buffered soils (5.1). 
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Extended Data Figure 5 | Calcite and exchangeable aluminium versus 
MAP minus PET. Transparent points represent a spatial sample of 20,000 
measurements from the NCSS database. a, Calcite (CaCO3) equivalents as 
mass percentage versus MAP minus PET. b, Exchangeable aluminium as 
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a percentage of the effective cation exchange capacity versus MAP minus 
PET. These data are not reported for all samples in the NCSS database, and 
so points on the plot represent only the subset of the data with reported 
values. 
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Extended Data Figure 6 | Dry-climate soil pH versus seasonality, 
relief and carbonates. Transparent points show soil pH at 0.5m depth 

in the driest quartile of MAP minus PET (1 =5,000). a, Soil pH versus 
the coefficient of variation (CV) of precipitation. b, Soil pH versus local 
relief. c, Violin plots showing soil pH versus carbonate lithology. Panels d 
and e show the proportion of the observations with pH < 6.5, binned into 
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Extended Data Figure 7 | Wet-climate soil pH versus seasonality, relief 
and carbonates. Transparent points show soil pH at 0.5 m depth in the 
wettest quartile of MAP minus PET (n=5,000). a, Soil pH versus the 
coefficient of variation of precipitation. b, Soil pH versus local relief. 

c, Violin plots showing soil pH versus carbonate lithology. Panels d and 

e show the proportion of the observations with pH >6.5, binned into 
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deciles of the variable on the x axis; panel f shows the proportion in each 
lithologic category. Black lines show logistic regression fits, with associated 
x? statistics and P values from likelihood ratio tests for precipitation 

CV (x?=3.5, P=0.06), local relief (x7 = 61.29, P< 0.01) and carbonate 
lithology (x? = 156.41, P< 0.01). Dashed lines show the proportion of 
observations with pH > 6.5. 
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Extended Data Table 1 | Soil profile data sets 


Dataset 


National Cooperative Soil Survey 


Chinese National Soil Database 


World Inventory of Soil Emissions- 


Potentials 


Africa Soil Profile Database 


Australian National Soil Database 


Canadian National Soil Database 


Soil Profile Analytical Database of Europe/ 


Measured Parameters 


Brazilian National Soil Database 


Provider 


United States Department of Agriculture 
Natural Resources Conservation Service 


Chinese Soil Survey 


International Soil Reference and 
Information Center 


International Soil Reference and 
Information Center 


Commonwealth Scientific and Industrial 
Research Organization 
Canadian Soil Information Service 


European Soil Data Center 


Luiz de Queiroz College of Agriculture 


Reference 


45 


46 


47 


48 


# profiles used 
in analysis 


34,259 (0.1 m) 
35,775 (0.5 m) 


2,183 (0.1 m) 
2,370 (0.5 m) 


2,505 (0.1 m) 
2,682 (0.5 m) 


10,057 (0.1 m) 
11,680 (0.5 m) 


4,389 (0.1 m) 
5,959 (0.5 m) 


6,035 (0.1 m) 
8,219 (0.5 m) 


375 (0.1 m) 
408 (0.5 m) 


488 (0.1 m) 
807 (0.5 m) 


Data sets marked with an asterisk are publicly available on request from the data provider. The other datasets are described in refs 45-48. 
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Dataset 


GPCC Full Data Reanalysis v7.0 


LandFlux-EVAL Synthesis 


GLiM Global Lithologic Map 


CERES Surface Radiation Budget v2.8 


GEWExX Radiation Budget v3.0 


CRU TS3.13 Global Surface Climatology 


ETOP0O1 Digital Elevation Model 


MOD12 Land Cover Classes 


GLASS Leaf Area Index 


Provider 


Global Precipitation Climatology Center 


Institute for Atmospheric and Climate Science, ETH Zurich 


Institute for Geology, Universitat Hamburg 


United States National Aeronautics and Space Administration 


United States National Aeronautics and Space Administration 


University of East Anglia Climate Research Unit 


United States National Centers for Environmental Information 


United States National Aeronautics and Space Administration 


Global Land Cover Facility 


Reference 


16 


18 


25 


38 


39 


40 


1 


42 


43, 44 


The models are described in refs 16, 18, 25, 38-44. 
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Broadening not strengthening of the Agulhas 
Current since the early 1990s 


Lisa M. Beal! & Shane Elipot! 


Western boundary currents—such as the Agulhas Current in 
the Indian Ocean—carry heat poleward, moderating Earth’s 
climate and fuelling the mid-latitude storm tracks”. They could 
exacerbate or mitigate warming and extreme weather events in 
the future, depending on their response to anthropogenic climate 
change. Climate models show an ongoing poleward expansion and 
intensification of the global wind systems, most robustly in the 
Southern Hemisphere®®, and linear dynamical theory’ suggests 
that western boundary currents will intensify and shift poleward 
as a result*®. Observational evidence of such changes comes from 
accelerated warming and air-sea heat flux rates within all western 
boundary currents, which are two or three times faster than global 
mean rates>”!°, Here we show that, despite these expectations, the 
Agulhas Current has not intensified since the early 1990s. Instead, 
we find that it has broadened as a result of more eddy activity. Recent 
analyses of other western boundary currents—the Kuroshio and East 
Australia currents—hint at similar trends!!-'5. These results indicate 
that intensifying winds may be increasing the eddy kinetic energy 
of boundary currents, rather than their mean flow. This could act 
to decrease poleward heat transport and increase cross-frontal 
exchange of nutrients and pollutants between the coastal ocean 
and the deep ocean. Sustained in situ measurements are needed to 
properly understand the role of these current systems in a changing 
climate. 

To estimate the trend in Agulhas Current transport we build a 
22-year proxy using three years of in situ measurements from the 
Agulhas Current Time-series (ACT) array'® combined with coincident 
along-track satellite altimeter data spanning the years 1993-2015 
(Fig. 1). 

We define two measures of transport for the Agulhas Current: 
a streamwise, southwestward jet transport Tj, and a geographically fixed, 
net boundary-layer transport Tyox. Over the three years of in situ data 
the mean and standard deviation of T,-¢ are —84Sv (1 Sv= 10°m? s~!) 
and 24 Sv, respectively, and of Tox are —77 Sv and 32 Sv, respectively’®, 
In past studies Tox has been more often applied to quantify boundary 
current flow, yet this measure suffers spurious effects from meander 
events'®, which are largely removed in the streamwise case. 

Before building a proxy we first test for the necessary condition of 
a linear and fixed relationship between our in situ transports and sea 
surface slope from satellite altimeter. Sea surface slope is equivalent 
to surface geostrophic velocity, and on the basis of previous analyses 
we expect the relationship between surface geostrophic velocity and 
full-depth transport to be strong in the Agulhas Current, despite 
the presence of an undercurrent!”!8. Using empirical orthogonal 
function (EOF) analysis, we find that transport and sea surface height 
and slope along the ACT array do exhibit similar and significantly 
correlated modes of variance (correlations >0.7, P values <10~3). In 
each case, these modes express weakening or strengthening, broad- 
ening or narrowing, and meandering of the Agulhas Current jet 
(Extended Data Fig. 1 and Methods). Hence, variance of the sea surface 
is strongly tied to oceanic transport along the ACT array at 34° S and 


using altimetry to build a proxy for Agulhas Current transport seems 
physically justifiable. 

For our Agulhas Current transport proxy we build nine linear regres- 
sion models between sea surface slope and transport per unit distance 
Tx, one at each of the ACT mooring locations (Fig. 1). We then fit 
a smoothed function at 1-km intervals and integrate horizontally to 
obtain Tyet and Tyox for each altimeter pass. In this way, we can account 
for a current that meanders and changes in width by allowing the core 
and flanks of the flow to vary at different rates, although the vertical 
structure necessarily remains fixed. A proxy based on regression 
between array-wide sea level variance and total transport is more 
statistically successful for Tyox; however, we find that this proxy’s 
trend is spurious because of the observed broadening of the current 
(Extended Data Fig. 2 and Methods). 

Our 22-year proxy time series are shown in Fig. 2, together with their 
frequency spectra and seasonal cycles. For Tyox our proxy explains 61% 
of the variance during the ACT period, while for Tye it explains 55% of 
the variance. The jet and boundary layer transports have spectral peaks 
that are significant (see Methods subsection ‘Spectral Estimates’) at the 
annual period (Fig. 2c) and their seasonal cycles match those of the 
in situ data!®, with weakest transports occurring during austral winter, 
in August, and strongest transports occurring throughout austral 
summer (Fig. 2d). At most timescales, the variance of Tie; is less than 
that of Tpox (Fig. 2c and d). 

The long-term trends are +1.0 + 2.4 Sv per decade for Tyox and 
+2.1 + 2.1 Sv per decade for Tj, (where tolerances are the 95% con- 
fidence interval). These trends would correspond to a slight decrease 
in Agulhas Current transport, although neither trend is significant. In 
the Indian Ocean, a spin-up of the ocean circulation over the northern 
reaches of the subtropical gyre has been implied by a small intensifi- 
cation of the trade winds’ and an increase in the Indonesian through- 
flow’? since the early 1990s. Farther south at the ACT array the trend 
in wind stress curl is less certain, since changes in the strength and 
latitudinal position of the Westerlies differ between wind products”’. 
Nevertheless, the derived winds from the 20CRv2 reanalysis show an 
upward trend in wind curl over all the southern subtropical oceans”!, 
and rapidly increasing sea surface temperatures and air-sea fluxes have 
been attributed to a strengthening and poleward shift of the Agulhas 
Current>”"°. 

Given that our results appear to contrast with these indicators, we 
examine our proxy trends more closely. We look at the structure of sea 
level and slope trends along the ACT array (Fig. 3a and b) and compare 
them to our transport trend. We find a non-uniform pattern of sea level 
rise across the Agulhas Current, with a minimum at 130 km from the 
coast (Fig. 1). This pattern gives rise to a negative trend in sea surface 
slope across the inshore half of the boundary layer and a positive trend 
offshore (Fig. 3b). Since the mean sea surface slope is positive across 
the Agulhas Current (Extended Data Fig. 1), these trends result in 
weaker velocities within the core of the current and stronger velocities 
over its flank. The effect on the Agulhas Current is a weakening and 
broadening of the jet, as illustrated by the change in T, across the array 


1Rosenstiel School of Marine and Atmospheric Science, University of Miami, 4600 Rickenbacker Causeway, Miami, Florida, USA. 
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Figure 1 | Agulhas Current Time-series (ACT) instrumental array 
and mean Agulhas Current from April 2010 to February 2013. 

a, Geographical location of ACT mooring array, consisting of current 
meter moorings A-G and CPIES (Current Pressure Inverted Echo 
Sounders) sites P2-P5. Bathymetry down to 1,000 m is shown in tan and 
in deepening shades of blue thereafter, with contours every 200 m and 
thick contours every 1,000 m, from 200 m to 5,000 m. b, Vertical section 


LETTER 


b 3-year mean Agulhas Current 
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of the mooring array superposed on the 3-year mean cross-track velocity. 
Bathymetry is shaded grey. Southwestward velocities (Agulhas Current) 
are shaded yellow through to red and northeastward velocities are shaded 
blue (see colour scale). Acoustic Doppler Current Profilers (ADCPs) 
measure velocity throughout the upper water column. Aquadopp current 
meters measure velocity at a single point. Pairs of CPIES estimate profiles 
of geostrophic velocity. 
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Figure 2 | Agulhas Current transport proxies based on regressions 
between transport and sea surface slope at each ACT mooring location. 
a, Proxy time series for jet or stream-wise transport Tjet- b, Proxy time 
series for boundary layer transport Tyox. The three years of in situ 
transports from the ACT array are shown as black lines. c, Frequency 


spectra of the jet (red) and boundary layer (grey) proxies. The 95% 
confidence interval is shown. d, Seasonal transport (daily average values) 
of the jet (red) and boundary layer (grey) proxies. Shading shows 95% 
confidence intervals. 
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Figure 3 | Trends of sea level and oceanic transport across the Agulhas 
Current, showing broadening. a, b, Sea level (a) and sea surface slope 

(b) linear trends estimated using along-track satellite altimeter from 

Aviso (blue) and CCI (orange) products (1993-2015). c, Transport 

per unit distance based on regressions between sea surface slope and 
oceanic transport at each ACT mooring site (circles). Transport changes 
between the beginning (solid lines) and end (dashed lines) of the altimeter 
record are implied by the linear trends. T, is net transport per unit distance 
and Ty, sw is its southwestward component (as used to calculate Tye). 
Vertical lines indicate the width of the boundary layer at the beginning 
(solid) and end (dashed) of the altimeter record. 


from the regression models at each mooring (Fig. 3c). The core of the 
current is in the same position, but weaker, while flow throughout the 
offshore flank of the current is stronger. The resultant broadening of 
the boundary layer, defined by the zero crossing of T;, is about 50 km 
(Fig. 3c). Hence, we conclude that the Agulhas Current is weakening 
and broadening over time, while its total transport remains stable. 

Broadening of the Agulhas Current can be understood in the context 
of a simple Munk model’, whereby the width of the western boundary 
layer will increase if the lateral viscosity increases. This could occur as 
a result of an increase in eddy activity in the current over time (Fig. 4). 
We derive the trend in eddy kinetic energy (EKE) of the Agulhas 
Current at this latitude from a mapped altimeter product using 
a fixed number of satellites over time, as has been done previously””. 
Consistent with this model, the trend in EKE is everywhere positive 
across the current, while the peak in mean kinetic energy (MKE) within 
the core of the jet is decreasing (Fig. 4). 

Broadening can also be understood in terms of meandering. 
Mesoscale meanders dominate the variance—and therefore the 
EKE—of the Agulhas Current”’, growing largely through a barotropic 
conversion of energy from the mean horizontal flow'”'®. Such 
instabilities act to transfer energy offshore and decelerate the core of 
the jet™*. Hence, a reported increase in the number of meander events 
over time’” will lead to a weakening and broadening of the flow. 

Measurements in other western boundary currents also point to 
trends of increasing EKE, rather than increasing transports, as we 
observe in the Agulhas Current. In the Pacific Ocean, there is an 
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Figure 4 | Kinetic energy analyses across the Agulhas Current, showing 
increased eddying. a, Total kinetic energy (TKE), mean kinetic energy 
(MKE) and eddy kinetic energy (EKE) from the mapped Aviso altimeter 
product. Mean and eddy kinetic energies are defined as variability at 
timescales of less and more than eighteen months, respectively. b, TKE, 
MKE and EKE linear changes over 22 years. In a and b the mean position 
of the current core and its offshore flank during the three years of ACT are 
shown as vertical grey lines. Data within 40 km of the coast are not available 
in the mapped altimeter product. 


increase in observed trade winds over the last twenty years, some of 
which is attributed to climate change”. Increases in sea surface height 
over the north and south subtropical Pacific point to a concomitant 
intensification of the ocean gyres*. However, no compensating trends 
have been found in the transports of the Kuroshio or East Australia 
currents!*!4, even while they have been warming!°. Instead, there 
is evidence that the Kuroshio Current is broadening, with the same 
pattern of sea level rise as seen across the Agulhas Current’!, and 
that eddy activity in both the Kuroshio and East Australia currents 
is increasing'?-"*. Furthermore, eddy variability of the East Australia 
Current has recently been linked to regional wind stress curl!°. In the 
Atlantic, prediction of trends in Gulf Stream intensity is complicated by 
a potential weakening of the overturning circulation” and by uncertain 
trends in wind curl!®, Sea level changes along the east coast of the USA 
have been used to suggest a weakening of the Gulf Stream’, but this 
is not corroborated by in situ measurements, which show no trend?®, 

Extending (that is, using data to extend in situ observations back in 
time) and inferring ocean circulation changes using satellite altimeter 
data is becoming commonplace'*'®”’, in an attempt to understand 
oceanic change better despite a paucity of measurements. Our results 
call for caution when inferring trends in currents using sea surface 
height difference alone, since sea level changes may be inhomogeneous 
across the current (Fig. 3). The exact position of the current, and any 
broadening or narrowing over time, must be taken into account 
(see Methods). On multidecadal timescales, the implicit assumption 
of a fixed vertical stratification may also become problematic, as 
thermohaline changes become important. More hydrographic data are 
necessary, particularly within western boundary currents, to be able to 
estimate trends in stratification. 

A particular weakness of our analysis is that it captures only changes 
in the Agulhas Current at 34° S, the location of the ACT array. Farther 
north, opposing trends in MKE and EKE are such that the mean flow 
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appears to be strengthening over time and there are fewer eddies””. 
However, these inferences are made where the current core is within 
40 km of the coast, a region of poor altimeter coverage, and are uncor- 
roborated by in situ measurements. Another weakness is that wind 
products exhibit large discrepancies in the mean”, such that temporal 
changes are even less reliable”°, and a poleward shift adds further 
uncertainty to any observed trend in wind stress curl°. Finally, a linear 
trend model accounts for only a small fraction of the variance in kinetic 
energy and transport of the Agulhas Current. 22 years of satellite 
observations are barely sufficient to discern anthropogenic trends, 
although decadal climate variability in the Indian Ocean is smaller 
than in the Pacific and Atlantic oceans. 50-year trends in sea surface 
temperature in the Indian Ocean sector are consistent with trends over 
the last two decades’. 

Our results, together with recent analyses in other western boundary 
currents, suggest that intensifying winds may act to increase the EKE 
of boundary currents, rather than their mean flow. This hypothesis 
draws parallels with the eddy compensation hypothesis for the 
Antarctic Circumpolar Current in the Southern Ocean, where eddies 
appear to dampen the effect of increased wind energy input on the 
mean flow””. In essence, while winds tend to accelerate the flow and 
steepen isopycnals, eddies mix laterally across the current to slump 
the isopycnals. Coupling between eddies and the atmosphere has 
also been shown to influence this frontal balance*!. The implication 
of broadening boundary currents is a more porous divide between 
the continental shelves and the open ocean, leading to greater mixing 
and cross-frontal exchange. In the Agulhas Current, these changes 
could also enhance upwelling over the shelf, since the strongest 
upwelling events are driven by meanders. These implications are in 
contrast to those of an intensifying flow, which would tend to dampen 
cross-frontal mixing and increase meridional heat transport. 

If western boundary currents are not strengthening, observed 
patterns of surface warming~”’° must be explained by a poleward shift 
of the ocean gyres. Broadening of western boundary currents and their 
extensions will also imprint on warming patterns and this should be 
considered in future analyses. Ocean reanalysis products and climate 
models fail to resolve western boundary currents and this could explain 
discrepancies among them and with our results*"®. 
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METHODS 


Mooring array and transports. Our in situ data are from the ACT mooring array, 
deployed across the Agulhas Current and along a satellite altimeter ground track 
near 34° S between April 2010 and February 2013 (Fig. 1). The array is 300 km 
long and oriented 15° clockwise from normal to the mean flow. Seven full- 
depth current-meter moorings and four current- and pressure-sensor-equipped 
inverted echo sounders (CPIES) were able to capture the full Agulhas jet at all 
times, including during meander events'®. CPIES pairs provide a cost-effective 
estimate of full-depth geostrophic flow at the offshore end of the array. Cross-track 
velocity profiles are horizontally interpolated to a resolution of 1 km and integrated 
vertically to obtain transport per unit distance, T,. We then define the western 
boundary jet transport Tye as the southwestward component of T; integrated to the 
first maximum of T, beyond the half-width of the mean jet (110 km)!°. We define 
boundary layer transport Tyox as Ty integrated out to 219 km, the three-year mean 
width of the jet (Fig. 1). 
Absolute dynamic topography. We use the 1-Hz unfiltered along-track 
absolute dynamic topography (ADT) product from Aviso. ADT is the sum of 
a mean dynamic topography and a sea level anomaly. The latest Aviso product 
uses a 20-year reference period from 1993 to 2012 to define the mean dynamic 
topography. Using sea level anomaly instead of ADT does not change our statistical 
analyses and regressions, since the two quantities differ only by a constant. In the 
main text we refer to ADT as sea surface height or sea level, although strictly the 
definition of ADT differs from sea surface height by a constant geoid. The ACT 
line is along altimeter track number 96 successively occupied by satellites TOPEX/ 
Poseidon (1992-2002), Jason-1 (2002-2008), and currently Jason-2 (since 2008). 
During the ACT experiment, there were 105 satellite passes across the Agulhas 
Current, providing data between 14.6 km and 306 km offshore with a horizontal 
resolution of about 6.2 km. Data are available up to 8.45 km from the coast, but we 
discount them because they are missing 30% of the time. We estimate ADT and 
its slope every 1 km along the track using a local order-one polynomial regression 
estimator with a 24-km half-bandwidth and an Epanechnikov kernel*”. Estimating 
slope in this way introduces less noise than differentiation and the Epanechnikov 
kernel minimizes the asymptotic mean square error of the resulting estimates*”. 
A half-bandwidth of 24km corresponds to a total window length that approaches 
the horizontal along-track decorrelation length scale of the flow at the ACT array 
(56km)!°. Varying this bandwidth by 50% does not significantly modify our results. 
Transport proxies. To test the assumption of a linear and fixed relationship 
between in situ transport and sea surface height and slope for the Agulhas Current 
we compare the combined eigenmodes of variance of ADT and ADT slope over 
the full 22-year record to those of T, over our 3 years of measurements. In each case 
we find four eigenmodes, of similar spatial structure, that each explain 10% or more 
of the variance (Extended Data Fig. 1). The dominant mode of variance at the sea 
surface is a broad-scale decrease or increase in sea surface slope, which is associated 
with a weakening or strengthening of the in situ transport. This is demonstrated 
by a strong correlation between the Principal Component time series of the first 
ADT mode and that of the second mode of T, (correlation 0.76). A narrowing or 
broadening of the jet is reflected in the second mode of ADT and the first mode of 
Tx (correlation 0.84). The remaining third and fourth modes of variance in each 
case reflect meandering of the jet (correlations 0.72 and 0.71, respectively), while 
all four eigenmodes project onto mesoscale meander events. In all cases we find 
P values smaller than 10~* and hence all correlations are highly significant. Because 
the Principal Component time series of T, are serially correlated and not normally 
distributed, we used a nonparametric resampling method* to calculate 
one-tailed P values for the magnitude of correlation. Given the similarities 
between eigenmodes and the significance of correlations, we conclude that a linear 
relationship between sea surface variance and full-depth Agulhas transport is a fair 
assumption at 34° S, where the undercurrent is weak!®. Hence, the variance of the 
current appears equivalent barotropic, in agreement with previous analyses!”"8, 
For our preferred proxy we build nine regression models, one at each 
current-meter mooring and CPIES pair along the ACT array, which linearly 
relate the local T, to the slope of ADT. Estimating ADT slope requires careful 
determination of the horizontal length scale of the flow at each site. We achieve 
this by using again an order-one local polynomial regression estimator with an 
Epanechnikov kernel*” to estimate both ADT and its slope, but this time varying 
the spatial bandwidth of the estimator in order to maximize the correlation 
between our measured T, and the slope estimate at each mooring. This gives 
length scales ranging from 27 km at mooring B to 102km at mooring G, consistent 
with our physical expectation of increasing length scale with increasing distance 
offshore. The polynomial regression estimator allows us to calculate the error 
variance of ADT and slope estimates at each mooring, based on the measurement 
errors in along-track ADT reported by Aviso. Reassuringly, we find that these 
error variances are at least an order of magnitude smaller than the variance of the 
resulting ADT and slope time series. Next, we remove outlying slope estimates by 


discarding the upper and lower 0.25% of the data distribution. To fill gaps we use 
multivariate regression between ADT slope at the location with missing data and at 
surrounding locations. This technique recovers 90% of the variance of the missing 
data, except at mooring A, where only 68% of the variance can be explained using 
adjacent records. Finally, we build linear regression models between ADT slope 
and T; at each of the nine mooring sites. The R’ statistics of these models vary from 
0.51 at mooring A to 0.81 at CPIES pair P4 and P5. The strongly sloping sea bed 
and Agulhas undercurrent (Fig. 1) probably contribute to the poorer skill of the 
regressions at the inshore moorings. 

From the results of the regressions, we obtain T, at each of the nine mooring 
locations for each satellite pass over 22 years. Subsequently, to calculate total trans- 
ports from these discrete points, we fit a shape-preserving piecewise cubic Hermite 
interpolating polynomial function to obtain T, at 1-km intervals across the current. 
We then integrate to obtain Tie, and Thox for each altimeter pass, as defined above 
(Fig. 2). Using instead a piecewise linear interpolation of T, between moorings 
and applying a low-pass filter with a cutoff of 56 km (the decorrelation length scale 
given by the ACT measurements) to produce a smooth function gives alternate 
estimates with root-mean-square differences of 4.6 Sv for Tie, and 1.3 Sv for Tyox. 

A more common methodology is to build a proxy by regressing total transport 
onto a broad-scale sea surface slope!*””. Regression based directly on along-track 
ADT at the ACT array is problematic, because the sea surface is strongly covarying 
along the length of the array, leading to large uncertainties in the regression 
coefficients (even though the resulting R? statistics of such a proxy can be high). 
Hence, we build a regression model using the uncorrelated Principal Components 
of the combined EOFs of ADT and ADT slope from above (Extended Data 
Fig. 1). To include as many altimeter passes as possible for the regression model, 
we recompute these combined EOFs for data points farther than 42.6 km from 
the shore to avoid missing data. The resulting EOFs differ very little from the 
ones presented in Extended Data Fig. 1. The regression model for both the jet and 
boundary transport proxies T is then: 


T=a9+ S> agAg(t) () 
keQ 

where a, are the regression coefficients estimated using the ordinary least-squares 
method and 2 is an ensemble of the Principal Component time series A;(t). For the 
ensemble we consider each eigenmode, in order of decreasing amount of variance 
explained, and incorporate successively only those Principal Components that 
increase the adjusted-R? statistics of our model (Equation (1)). The adjusted-R? 
statistics quantifies the amount of variance explained by a multivariate model but, 
in contrast to the classic R? statistics, attempts to correct for the explained variance 
that occurs by chance from adding a random variate to the model. For Tyox this 
procedure selects Principal Components 1 and 2 (correlated with transport at 
0.85 and 0.22 respectively, with P values <10~? and 0.09), and for Tye Principal 
Components 1, 2 and 3 (correlated at 0.49, 0.27 and 0.20, respectively, with P values 
<10°-, 0.015 and 0.05). 

The resulting proxy for Tox explains 76% of the variance, while in the case 
of Tet the proxy explains only 39% of the variance (Extended Data Fig. 2). The 
poor performance of the jet proxy is probably because it is defined from only 
the southwestward flow, while sea level variance along the whole array reflects 
the net flow. Estimating the linear trends of the two time series we find that the 
boundary layer transport appears to be strengthening at a significant rate (+95% 
confidence interval) of —4.9 + 2.0 Sv per decade, while the jet transport has a 
weaker and insignificant trend of —1.4+ 1.5 Sv per decade (Extended Data Fig. 2). 
The boundary layer proxy trend fulfils prior expectations, based on warming 
sea surface temperatures and strengthening winds over the region”. However, 
this strengthening is inconsistent with trends in sea surface height across the 
current (Fig. 3). The structures and nodes of the eigenmodes of sea level and 
slope (Extended Data Fig. 1) are fixed in space and seem to be unable to capture 
the change in structure of the boundary layer over time. Throughout the 22-year 
record, the trend patterns explain never more than 8% of the local variance and the 
EOFs are virtually unchanged when recomputed after detrending the data. Simply 
put, a proxy based on a single regression with total transport does not allow for a 
broadening of the current. 

Spectral estimates and annual cycle. Power spectral densities of the proxies Tyet 
and Thox are estimated using the adaptive multitaper method with five Slepian 
tapers**. This method exhibits less variance than other common methods and is 
routinely employed for climate time series analysis**. Using three or seven tapers, 
of the Slepian form or other forms, does not qualitatively change our results. 
Significant spectral peaks at the 95% confidence level were identified using a test 
for periodicities within a red background spectrum”®. Since Tje, has less energy than 
Tpox at periods longer than 100 days (Fig. 2), we expect decadal signals and trends 
to be more clearly detected in the jet transport. Annual cycles are obtained by 
calculating daily means using the Nadaraya—Watson kernel estimator with a 
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Gaussian kernel of half-width 30 days*”. 95% confidence intervals are based on 
the assumption of a normal distribution of the estimates. 

Trends in sea level and kinetic energy. Trends in sea level (Fig. 3) are calculated 
using along-track satellite altimeter data from Aviso and from the new Sea 
Level Climate Change Intitiative (CCI) product*’, which is optimized for long- 
term signals. For kinetic energy across the Agulhas Current (Fig. 4) we use the 
post-2014 geostrophic velocity maps from Aviso, which are derived from a two- 
satellite, merged, delayed-time product. For long-term signals it is important to 
use the two-satellite product because it ingests a consistent amount of data over 
time, avoiding the introduction of sampling bias that may cause spurious trends in 
variance. TKE is defined as half the sum of the squared horizontal components of 
velocity. MKE is calculated from the total velocity time series by low-pass filtering 
using a sliding quadratic window with half-bandwidth of 18 months. EKE is 
calculated from the velocity residuals obtained by subtracting the low-pass-filtered 
velocities from the total velocities. These calculations result in an interannually 
evolving MKE, and an EKE which captures variability at timescales of less than 
18 months. Although mapped satellite altimeter products have difficulties resolving 
boundary flows that are narrow and close to the coast, reassuringly we find that 
at the ACT array peak MKE occurs close to the mean in situ Agulhas Current 
core (Fig. 4a). 

Code availability. MATLAB scripts used for our analyses and figures are available 
upon request from S.E. 

Data availability. The in situ mooring data from the ACT experiment (velocities 
from current meters and acoustic Doppler current profilers; sound speed, pressure 
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and bottom velocity from CPIES) are archived with the NOAA National Centers 
for Environmental Information (https://www.ncei.noaa.gov), with accession num- 
bers 0156669 and 0156605. Aviso along-track Absolute Dynamic Topography data 
are accessible through http://www.aviso.altimetry.fr/en/data/products/sea-surface- 
height-products/global/adt-h.html#c5139. Maps of absolute dynamic topography 
and absolute geostrophic velocities are accessible through http://www.aviso.altimetry. 
fr/en/data/products/sea-surface-height-products/global/madt-h-uv.html. The CCI 
altimeter along-track data correspond to the Fundamental Climate Data Record prod- 
uct (http://dx.doi.org/10.5270/esa-sea_level_cci-1993_2014-v_1.1-201512) generated 
by the Sea Level Climate Change Initiative project (http://www.esa-sealevel-cci.org/). 
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and sea surface slope are similar across the Agulhas Current. a, First altimetry. Note that the ADT gradient is positive across most of the array, 
four eigenmodes of the transport per unit distance from the ACT array. but is shown as negative for comparison with the southward Agulhas 
b, First four combined eigenmodes of sea surface height and slope, from Current transports in a. Black dotted lines depict mooring positions. 
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Extended Data Figure 2 | Agulhas Current transport proxies based on regressions of total transport with sea surface eigenmodes. a, Proxy for jet or 
stream-wise transport Tye. b, Proxy for boundary layer transport Tyox. The three years of in situ transports from the ACT array are shown as black lines. 
The trends of these two proxies are inconsistent, owing to observed broadening of the jet. 
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Chemical intervention in plant sugar signalling 
increases yield and resilience 


Cara A. Griffiths!*, Ram Sagar?*+, Yiqun Geng?*, Lucia F. Primavesi!, Mitul K. Patel’, Melissa K. Passarelli*, lan S. Gilmore’, 


3 


Rory T. Steven, Josephine Bunch**, Matthew J. Paul! & Benjamin G. Davis? 


The pressing global issue of food insecurity due to population 
growth, diminishing land and variable climate can only be addressed 
in agriculture by improving both maximum crop yield potential 
and resilience!”. Genetic modification is one potential solution, 
but has yet to achieve worldwide acceptance, particularly for crops 
such as wheat®. Trehalose-6-phosphate (T6P), a central sugar signal 
in plants, regulates sucrose use and allocation, underpinning 
crop growth and development*®. Here we show that application 
of a chemical intervention strategy directly modulates T6P levels 
in planta. Plant-permeable analogues of T6P were designed 
and constructed based on a ‘signalling-precursor’ concept for 
permeability, ready uptake and sunlight-triggered release of T6P 
in planta. We show that chemical intervention in a potent sugar 
signal increases grain yield, whereas application to vegetative tissue 
improves recovery and resurrection from drought. This technology 
offers a means to combine increases in yield with crop stress 
resilience. Given the generality of the T6P pathway in plants and 
other small-molecule signals in biology, these studies suggest that 
suitable synthetic exogenous small-molecule signal precursors can 
be used to directly enhance plant performance and perhaps other 
organism function. 

We designed a signalling-precursor strategy on the basis of release by 
light (Extended Data Fig. 1). Light-activated control is a potent modu- 
lation strategy in biology, allowing for temporal and spatial resolution 
surpassing that of standard genetic methods®. Additionally, such 
resolution can be increased when combined with small-molecule 
chemical control’~*. Potency is further increased when releasing a sig- 
nalling molecule, as the effects of light-activated control are increased 
several-fold through the inherent amplification of signalling. 

Hydrophilic or charged molecules do not readily enter plants unless 
transported. We therefore designed unnatural precursors (1-4) of 
T6P with groups to mask charge, increase hydrophobicity and also 
to induce release by light (Fig. 1a). Their construction (Fig. 1b) used 
different phosphorus chemistries: phosphoramidite chemistry!®"! to 
create P(III)-intermediates that were then oxidized to correspond- 
ing P(V)-phosphotriesters, or direct P(V)-phosphorylation chem- 
istry (Fig. 1b). Regioselective access to the OH-6 group in trehalose 
exploited trimethylsilyl as a protecting group that is chemically orthog- 
onal to the phosphotriester; 12 was prepared on a multigram scale”. 
Phosphorylation (reaction with phosphoramidites 9-11 (refs 10, 11) 
followed by tBuOOH, or treatment with POCI; (ref. 13) followed 
by the addition of the appropriate alcohol) gave intermediates that 
were deprotected under mildly acidic conditions (see Supplementary 
Methods). 1-4 were all inactive against SnRK1 (SNF 1-related kinase 1) 
(Extended Data Fig. 2). 

Mass spectrometry, thin-layer chromatography and nuclear magnetic 
resonance spectroscopy (see Supplementary Methods, Supplementary 
Table 1 and Extended Data Fig. 1) showed release times (95% release, to5) 


dependent on both light intensity and frequency under a range of 
conditions. Consistent with design, light-sensitive groups were differently 
susceptible. Precursor ortho-nitrobenzyl (oNB)-T6P 1, for 
example, generated T6P by light-activated release more rapidly 
at lower wavelengths, while compound mono-dimethoxy(ortho- 
nitro)benzyl (mono-DMNB)-T6P 4 was more reactive at higher 
wavelengths. Although release with higher light intensity/photon flux 
(125 W/365 1mol m~* s+ (with photon flux defined as the number 
of photons per m? per second) compared to 8 W/23 umol m~* s~) 
was more rapid, direct sunlight proved sufficient, in some cases 
resulting in fos as brief as 90 min (for (ortho-nitrophenyl)ethyl (oNPE)- 
T6P (compound 3)). Nuclear magnetic resonance spectroscopy analysis 
(Supplementary Methods and Extended Data Fig. le, f) confirmed that 
TOP was formed and that potent inhibitory activity against SnRK1 was 
induced (Extended Data Fig. 2). 

Following successful in vitro release, uptake in planta was examined. 
Compounds 1-4 (at a final concentration of 1 mM) were fed to roots of 
plantlets of Arabidopsis thaliana and the aerial parts were analysed over 
time and with increasing dose (Fig. 2 and Supplementary Tables 2-7). 
High-performance liquid chromatography (HPLC) followed by quanti- 
tative mass spectrometry'* (HPLC-MS) of extracts of the aboveground 
biomass (shoot and leaves) showed increasing uptake of the compounds 
over time (Fig. 2a and Extended Data Fig. 3) and with increasing dose. 
Consistent with design, structural variation showed that altered hydro- 
phobicity modulated permeability'® and transport'®. Notably, systematic 
variation of group type and copy number identified compound 3 
(log[P] of the compound is 0.11 + 0.60, where P is the partition 
coefficient; see also Supplementary Methods and Supplementary Table 17) 
as the most readily taken up (Fig. 2a), with absorption of approxi- 
mately 20% after 72h. Compounds 1, 2 (dimethoxy(ortho-nitro)benzyl 
(DMNB)-T6P) and 4 (log[P] of the compounds ranged from —2.35 
to —0.17), on the other hand, were less readily taken up. 

Next, we investigated the light-activated release in planta. Plants were 
treated with compounds dissolved in medium, grown for a further 
three days, irradiated, and the shoots were harvested and extracted!’, 
TOP release was confirmed by tandem mass spectrometry (Extended 
Data Fig. 4) and determined by quantitative HPLC-MS (with 2-deoxy- 
glucose-6-phosphate as an internal standard"4; Fig. 2b, Extended 
Data Fig. 4 and Supplementary Table 10). Release in planta could be 
controlled and modulated by the choice of light source and signalling 
precursor (Fig. 2b and Supplementary Table 10). 

Most transgenic approaches only alter T6P over a 2-3-fold range*”. 
Using compounds 1-4, levels of up to 900 nmol g™! fresh weight were 
attainable, which was 100-fold higher than endogenous levels and 
75-fold higher than can be achieved with genetic methods. Consistent 
with this strategy, maximal T6P was released when precursor-treated 
plants were irradiated with the highest flux (100 W/292 mol m~* s~! 
UV) in all cases. Notably, and with relevance to application in the field, 
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Figure 1 | Design and synthesis of signalling precursors of T6P. ToP 

is plant-impermeable; synthesis of plant-permeable variants allowed 
subsequent photo-activated release of T6P in planta. a, Designed 
precursors 1-4, with the backbone on the left and side chains (blue circle 
on the back bone) of the individual precursors on the right. b, Synthesis 


with sunlight only, all treated plants released significantly increased 
amounts of T6P (39-296 nmol g~! fresh weight) (Supplementary Table 
10), some approximately 4-30-fold above endogenous levels. There 
was no significant reduction in the fresh weight of plantlets treated 
with 1 mM of compound (Supplementary Methods and Extended Data 
Fig. 5), suggesting low toxicity. Accumulation of T6P after treatment 
was analysed with mass spectrometry imaging’ (Fig. 3) using signa- 
ture-ion markers!” in treated leaves of A. thaliana seedlings (Fig. 3b) 
after 2 h of irradiation; the different distributions from compounds 2 
and 3 appeared consistent with their measured release rates. Notably, 
increased trehalose was also observed” in the same regions (Fig. 3c), 
suggesting metabolism. Moreover, mass spectrometry imaging using 
treatment-specific ions corroborated the uptake of precursors into 
leaves (Fig. 3d, e). 

The dynamics of this enhanced in planta TOP release, and possible 
consequent metabolic products, were determined not only through 
both quantitative HPLC-MS and/or enzymatic quantification, but 
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Figure 2 | In planta uptake of signalling precursors and T6P release. 

a, Uptake of compounds 1-4 (1 mM in medium) at 24, 48 and 72h 

(data are shown as mean + s.e.m., n = 3). FW, fresh weight. b, T6P released 
in planta (data are shown as mean +s.e.m., n= 3). Compounds were 
applied 18 days after sowing and then irradiated for 72h. otsA, A. thaliana 
overexpressing trehalose-phosphate synthase. Growth light (GL) and other 
conditions are defined as the following levels of photon flux: GL, 250 mol 
m*s-!;GL+8 W UV (365nm), 23,1mol m~” s-!; GL+100 W UV, 


of the precursors using phosphoramidite chemistry (1-3) or direct 
phosphorylation chemistry (4) from the key intermediate compound 12. 
Universally '?C-labelled 2* was prepared in essentially the same manner 
(see Extended Data Fig. 7a). THE, tetrahydrofuran. 


also through the use of unnaturally enriched isotopic labelling of the 
signalling precursors, allowing for unambiguous delineation of their 
fate (Fig. 4, Extended Data Fig. 6 and Supplementary Methods). Thus, 
in 7-day-old A. thaliana seedlings’, treatment with 1mM of compound 
2 or 3, fed for 24h before exposure to light (UV 8 W/23 pmol m 7s}, 
led to peak T6P after 60 min (229 and 159 nmol g“! fresh weight, 
respectively), which declined over the following 2 days (Fig. 4a). 
Corresponding trehalose levels were also elevated, with peaks at around 
2h (up toa maximum of 134nmol g™ fresh weight compared to control 
levels of 20 nmol g~! fresh weight; Fig. 4b), confirming the metabolism 
indicated by mass spectrometry imaging. Glucose, the next sugar in 
the pathway, was also increased but to a smaller degree, peaking at 
around 2-4h (Fig. 4d). These levels are consistent with known low 
metabolic fluxes?*. Given known interrelationships?, sucrose levels 
were also determined. Notably, these increased 2-3-fold over the first 
2h of irradiation (Fig. 4c) and correlated positively with T6P for both 
compounds 2 and 3 (Fig. 4); whereas fructose was minimally affected 
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(no cloud), 1,440 |1mol m~? s~!, ANOVA showed significant differences 
(P< 0.001) between treatments (water or precursor) for each regime. 

All treatments with precursor and UV showed significance (P< 0.001, 
least significant difference (LSD)) compared to water and UV. For growth 
light irradiance *P < 0.05; **P <0.01 (LSD, data shown on a linear scale). 
See also Supplementary Table 10. 
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Figure 3 | Mass spectometry imaging of treated leaves. For all 

ion images, the pixel intensity scale represents the area under the 
corresponding m/z peak. a, Time-of-flight secondary ion mass 
spectrometry (ToF-SIMS) spectra from the surface of A. thaliana leaves 
(top three spectra). Marker ions m/z 156.8, 196.8 and 212.8 in T6P 
reference (bottom spectrum). b, T6P (three markers, green) in control 
(left), 3-treated (middle) and 2-treated (right) leaves, anti-colocalized 
with H2O (m/z 18.0) and silicon substrate (m/z 27.9). c, T6P (markers, 


(Extended Data Fig. 6d). We confirmed that inhibited growth was not 
an explanation for sucrose accumulation and found instead that growth 
was stimulated by T6P (Extended Data Fig. 6e). 

Creation of a '?C-isotopically-labelled variant 2* (Extended Data 
Fig. 7a) allowed direct tracking via ‘mass-shifts’ of the corresponding 
ions using mass spectrometry. Treatment with 2* led to release of 
C-ToP and consequent sequential metabolism (to °C-trehalose and 
'3C-glucose) following essentially the same dynamics (Extended Data 
Fig. 7). Notably, using this method, mass labelling also showed that the 
compounds not only released, but also induced T6P. This accounted for 
approximately half of T6P measured at 30 min, thereby providing direct 
evidence of induction of de novo T6P synthesis and this induction con- 
tinued, giving rise to increased T6P accumulation over time (Extended 
Data Fig. 7). This could be due to the large increase in sucrose observed, 
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and overlay with known”? epicuticular wax markers (red, negative mode) 
in the 3-treated leaf. d, e, Matrix-assisted laser desorption/ionization- 
mass spectometry (MALDI-MS). d, Overlay of mean on-leaf spectra for 
control, 3-treated and 2-treated leaves. Lower panels show expansions for 
correlated markers. e, RGB-colour overlay images of marker ions; separate 
ion images are shown on right. Images in b, c and e are representative of 
three individual images. 
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as sucrose induces T6P>”!. Together these data suggest that perturba- 
tion of T6P levels occurs via two modes of action: direct release from 
the signalling precursor and simultaneously induced biosynthesis of 
TOP by the plant. 

Our data indicate that compound 3 is the plant-permeable sig- 
nalling precursor with the greatest tissue uptake coupled with the 
greatest temporal control (consistent with its tuned permeability 
and its fastest release rates), allowing minimal application amounts 
(0.1mM), while still able to enhance T6P levels around 1.5-6.5-fold 
above endogenous levels without potentially detrimental disrup- 
tion of metabolism’. Plants were treated with precursor 3 for 72h 
and were then subjected to a single 8-h period under growth lights 
supplemented with 8 W UV (Supplementary Table 10, generating 
around 21 nmol g~! fresh weight T6P) and harvested a day later. 


Figure 4 | In planta T6P release and sugar 
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60 min 
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metabolism over time. a—d, Seven-day-old A. 
thaliana seedlings grown in liquid culture were 
treated with 1 mM of either 3 or 2; control seedlings 
were treated with water. Seedlings were left under 
growth lights to take up the signalling precursors 
for 24h, after which the plants were exposed to 

23 1mol.m? s~! UV for 2h. Measurements were 
taken 1 d after uptake (pre-UV), 30, 60 and 

120 min after the initiation of UV treatment 

(23 umol m~* s~'), and 1 and 2 days after the 
initiation of UV treatment. a, T6P content. 

b, Trehalose content. c, Sucrose content. d, Glucose 
content. In all cases n = 3; *P < 0.05; **P< 0.01; 
Student’s t-test; data are shown as mean +5.e.m. 


The mean starch level (63.2 mol g~! fresh weight) determined was 
significantly higher (F(1,14) = 13.59; P=0.002) than for water-treated 
plants (40.7 mol g~! fresh weight; Extended Data Fig. 5d). 

The production of T6P from compounds 1-4 involves fragmentation 
with concomitant release of side products. Although considered” to 
be non-toxic, we nevertheless tested for any unexpected phenotypic 
changes. Glucose-6-phosphate (G6P) analogues 14-17 of compounds 
1-4 were synthesized (Supplementary Methods and Extended Data 
Fig. 8) and compared for their activity; G6P-methyl-glycoside itself 
is inactive in planta and in all interactions with SnRK1 (Extended 
Data Fig. 8) and so its light-activated release from analogues 14-17 
provided a useful control. Analogues 14-17 showed similar light- 
activated release parameters to compounds 1-4 (Supplementary 
Table 11) and relative uptake performance was similarly dependent 
on the identity of the light-sensitive moiety (Extended Data Fig. 8 and 
Supplementary Table 12-14). No toxicity was observed in any of the 
plants treated with up to 0.5mM of analogues 14-17 (Supplementary 
Methods and Extended Data Fig. 5a—c), suggesting that the light- 
released moiety is benign. Critically, starch was not affected in controls 
treated with compound 16, the G6P analogue of T6P precursor 3. 

The rate of starch synthesis in A. thaliana over a 12-h period (Extended 
Data Fig. 5f, g) indicated a flux (0.037 pmol min™! g~! fresh weight) 
nearly three times that of water-treated controls (0.013 j1mol min~! g~! 
fresh weight). T6P is proposed to stimulate starch synthesis through 
redox activation”® of ADP-glucose pyrophosphorylase (AGPase), 
a rate-limiting enzyme. While not necessarily causal, consistent with 
this hypothesis, plants treated with compound 3 had significantly 
higher AGPase activity (increased by 35%, Extended Data Fig. 5e). 
AGPase has been previously shown to affect starch turnover”. 

We also measured the number of transcripts of genes known to 
be associated with T6P. First, SnRK1 is a proposed target of T6P*’. 
SnRK1-induced (TPS5, bZIP11 (also known as GBF6) and UDPGDH 
(At3g29360)) and -repressed (TPS8 and ASN1) markers responded 
synchronously to the activation of the precursors in a manner con- 
sistent with known effects of T6P on SnRK1 activity (Extended Data 
Fig. 9). However, other markers (for example, UDPGDH (At3g29360) 
and bGAL4) showed clear temporal delay (Extended Data Fig. 9b); the 
observed synchronization of these ‘secondary markers’ only occurred 
after a day, which suggests that these could be later, downstream targets 
of T6P. Second, starch is also a proposed target of T6P?°3". As starch 
levels were increased as a result of treatment, expression of starch 
biosynthetic genes was also analysed. Transcripts of APL3, SS3, BE1 
(also known as EMB2729) and GBSS1 were increased up to fivefold 
(Extended Data Fig. 9c). 
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Figure 5 | Increased crop yield. a, Increased grain size (20 per tube) after 
spraying. b, Grain yield per plant with 1 mM of compounds 2 or 3. 

c, Starch content of the grain. *P < 0.05 compared to water control 
(Student’s t-test). Data are shown as mean +s.e.m. (n=6). 


These data from A. thaliana raise the noteworthy possibility of 
enhanced starch synthesis in crops, potentially providing increased 
yield. Signalling precursors 2 and 3 were applied (0.1, 1 and 10 mM) 
to spring wheat (Triticum aestivum Cadenza), which was grown in a 
controlled environment representative of summer in northern Europe. 
Spraying occurred either to ears only or to the whole plant during the 
grain-filling period (5, 10, 15 and 20 days post-anthesis (the flowering 
period, DPA)) at mid-photoperiod. This increased grain yield per plant 
due to the formation of larger grain, particularly in plants treated with 
1mM of compounds 2 or 3 (Fig. 5a, b). In these grains, starch content 
increased 13-20% (Fig. 5c). A trend towards higher levels of starch 
and protein, when expressed as a percentage of component content 
per gram of grain, was also observed (Supplementary Table 16). 
Dose-response analysis showed that yield peaked at a precursor 
concentration of 1mM (Extended Data Fig. 10f). Minimal spray 
amounts at only 10 DPA increased yield substantially at 1 and 10 mM 
doses (Extended Data Fig. 10g). Plants treated with compounds 2 and 3 
stayed greener for longer than plants treated with water only, consistent 
with chlorophyll content (Extended Data Fig. 10a, b) and previous 
observations for genetically-enhanced T6P content”*®. T6P release 


Figure 6 | Increased crop resilience. a, Plants 
after 20 d recovery following one application of 
1mM 3 or 2 1 d before rewatering. b, Dry weight 
(DWT) biomass from plants in a. c, Plants after 
one application of 1 mM 3 or 2 one day before 
rewatering, cut at 5 d after rewatering, and left to 
regrow for 10 d. White arrow and line, cut back 
point. d, Fresh weight (FW) biomass of regrowth 
from c. c, d, *P < 0.05 compared to water-treated 
control (Student’s t-test, n = 6). Data are shown 
as mean + s.e.m. 
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in the wheat grains treated with compounds 2 and 3 was enhanced 
at 5 DPA (128nmol g~' fresh weight and 81 nmol g™! fresh weight, 
respectively) and further at 10 DPA (378 nmol g~' fresh weight and 
300 nmol g™! fresh weight, respectively) (Extended Data Fig. 10c-e). 
Trehalose levels were also higher (30-70 nmol g~' fresh weight 
compared to endogenous levels of 13 nmol g~! fresh weight), consistent 
with the metabolism observed in A. thaliana. 

Next, the effects of signalling precursors on plant resilience and 
recovery were analysed. Drought is still the biggest global factor limiting 
crop yields, even in developed countries”. When 4-week-old wheat 
plants were sprayed with compounds 2 or 3 (30 ml, 1 mM, once) after 9 
days of drought, the regrowth effects following resumption of watering 
1 day after treatment were substantial (Fig. 6a, b). Regrowth of new 
tissue from plants cut back after drought was also higher in precursor- 
treated plants (Fig. 6c, d). This demonstrated both growth of new tissue 
(resurrection response) and salvage and growth of existing tissue 
(recovery response). T6P solution alone gave identical results to water 
(Fig. 6), consistent with the inability of T6P to enter directly into plants, 
further highlighting the design principles of signalling precursors. 

In conclusion, we have shown here that a chemical strategy can 
directly control amounts of an important sugar signalling molecule 
in vivo. The collected data are consistent with the signalling action 
of released T6P. For example, the mass balance of added signalling 
precursor appears insufficient to simply act as a carbon source. That 
said, we do not discount other possible mechanisms behind the 
noteworthy traits that we have observed here. The apparent result of 
‘biosynthetic amplification’ observed from signalling precursors is, we 
believe, a promising concept; we calculate here up to 50-fold ‘molecular 
amplification’ of the plant sugar product compared to the precursor. 
It may therefore be possible to design a self-sustaining production 
strategy in which a fraction of the additional starch generated by this 
amplification is used as a feedstock chemical for eventual synthesis of 
the signalling precursors themselves (Supplementary Discussion and 
Supplementary Table 18). 

We speculate that this chemical approach also offers more temporal 
and strategic flexibility than genetic methods (for example, a ‘pulse’ 
to circumvent adaptation effects or in manipulating more genetically 
complex crops) as well as the prospect of providing an immediate boost 
to productivity at critical times in the plant life cycle (for example, to 
allow synchronicity with the sun or to rescue drought-stricken regions, 
Supplementary Discussion)— the potential to contribute towards 
global food security seems notable and immediate. Given the wide- 
spread importance of cell signalling and of carbohydrates in biology, 
this system may also have even wider utility. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Synthesis of signalling-precursor compounds 1-4. 1H-tetrazole solution 
(0.45 M in CH3CN) (0.6 ml, 0.24 mmol, 2.0 equiv.) was added into a stirred 
solution of compound 12 (100mg, 0.12 mmol, 1 equiv.) and bis-(2-nitrobenzyl)-N, 
N-diisopropylphosphoramidite (compound 9; 78.3 mg, 0.18 mmol, 1.5 equiv.) in 
anhydrous CH>Cl (5 ml) under an argon atmosphere at 0°C. The resulting reaction 
mixture was stirred at 0-5 °C and progress of the reaction was monitored by thin- 
layer chromatography (petroleum ether:ether, 8:2) and mass spectrometry. After 
complete disappearance of starting material (1h), tuOOH (0.1 ml) was added 
at 0°C and stirring was continued for another 30 min. After 30 min the reaction 
mixture was concentrated in vacuo and the residue was suspended in methanol 
(2 ml) and stirred in the presence of 30 mg of Dowex-H* resin for 1h at room 
temperature to globally remove trimethylsilyl groups. Dowex-H* was removed 
through filtration and the filtrate was concentrated, which on flash chromatog- 
raphy (water:isopropanol:ethyl acetate, 1:2:8) purification yielded compound 1 
(70 mg) in 87% isolable yield. Similar reaction protocols were used for the 
synthesis of compounds 2 and 3. Compound 4 was obtained when a stirred 
solution of 12 (100 mg, 0.12 mmol) in pyridine (2 ml) at room temperature was 
treated with POC]; (0.012 ml, 0.132 mmol) for 10 min followed by addition of 
4,5-dimethoxy-2-nitrobenzyl alcohol (76.7 mg, 0.36 mmol) and continuous stirring 
for 1h. The resulting reaction mixture was concentrated in vacuo to yield a crude 
product mixture, which was treated with Dowex-H* (30 mg) in methanol (2 ml). 
After filtration, concentration in vacuo and flash chromatography purification 
yielded compound 4 (45 mg, 62%) as a pure sticky solid. For additional details see 
Supplementary Methods. 

In planta uptake of signalling-precursor compounds and release of trehalose- 
6-phosphate and metabolites. In planta uptake was carried out using A. thaliana 
plantlets. A. thaliana (Columbia 0) seeds were surface-sterilized for 10 min in 10% 
sodium hypochlorite, 0.01% Triton X-100 and then copiously washed with sterile 
water and stratified for 3 d at 4°C. Seeds were sown onto 0.5 ml solid medium (0.5 x 
Murashige and Skoog medium with Gamborg’s vitamins (Sigma P0404), 0.5% 
sucrose and 0.5% agar) in 0.5 ml Eppendorf tubes, pierced in the bottom with a tiny 
hole. The tubes were arrayed in hand-cut polystyrene racks in Phyta trays (Sigma) 
and floated on liquid medium (same as solid medium but lacking sucrose and 
agar). Plantlets were grown under the following conditions: 12 h day under Philips 
master TL-D 840/58W fluorescent lights outputting 250j1mol m~* s~!, with 23°C 
day and 18°C night temperatures. At 18 days after sowing the liquid medium was 
removed and the tubes were sealed with electrician’s tape. All plants were topped 
up with 0.5 Murashige and Skoog medium with no sucrose. 

Plants were treated with compounds by adding 10 11 of a50 mM stock prepared 
in water or 1% DMSO to the agar medium, avoiding contact with aerial parts. 
The final concentration of the precursor compound in the agar medium was 
1 mM. After a certain period of time (after 24h, 48h and 72h) the aerial part was 
harvested carefully, weighed and extracted in HxO:MeOH (1:1) under liquid 
nitrogen. The crude fresh plant extract thus obtained was analysed by mass 
spectrometry and HPLC. 

For in planta TOP release experiments, compound-treated plants were exposed 
to UV-light treatment after 72h. UV treatments consisted of: (a) 8-h exposure to 
natural daylight; (b) 8-h exposure to a 100 W UV spotlight (BlackRay B-100AP) at 
a distance of 18 cm; (c) 8-h exposure to an 8 W UV bulb (365 nm, Gelman transil- 
luminator Model 51438) at a distance of 6 cm; or (d) exposure for two 8-h periods 
to 8 W. UV treatments were in addition to normal growth lights. Control plants 
(except for daylight treatment) were treated under the same conditions but without 
UV light. At the end of the day of exposure to UV or visible light, the aerial parts of 
the plants were quickly harvested, weighed and frozen in liquid nitrogen. For starch 
extractions, a moderate light regime was selected of a single 8h exposure to 8 W 
UV light. After irradiation plants were returned to the growth room for a further 
day (day 5) to recover after light treatment and to respond to altered T6P levels 
before being harvested as above. Frozen tissue was stored at —80°C until extracted. 

Harvested plant material was extracted by liquid/liquid extraction (LLE) 
followed by solid phase extraction (SPE) for T6P analysis!’. For LLE/SPE 
extractions around 25 mg plant tissue was used, pooled from several plants. 
Samples were reconstituted in 50 jl of HxO:MeOH (1:1) and 10,1 was used for 
T6P analysis; T6P release was determined using quantitative HPLC-MS (Quattro, 
Waters), with 2-deoxy-glucose-6-phosphate as a calibration internal standard. 

Liquid chromatography tandem mass spectrometry (LC-MS/MS) was used 
to confirm the identity of disaccharide monophosphates via fragmentation 
pattern analysis performed on a Waters Xevo G2-S QTof (quadrupole time- 
of-flight) mass spectrometer coupled to a Waters Acquity Ultra Performance Liquid 
Chromatography (UPLC) system, and a Waters Micromass Quattro micro API 
Mass Spectrometer coupled to a Waters 1525, binary HPLC pump and a Waters 
2777 auto sampler using a SIELC Primesep SB column. Solvent A (0.1% formic 
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acid in H,O) and solvent B (1.0% formic acid in H.O:CH3CN (75:25)), were used 
as the mobile phase at a flow rate of 0.4ml min™!. For the Xevo G2-S QTof MS, 
the electrospray source was operated with a capillary voltage of 2.0kV and a cone 
voltage of 30 V. Nitrogen was used as the desolvation gas at a total flow of 8001 h”!. 
The intact molecular ion of T6P was detected as m/z 421.0759 (Cy2H22O0,4P, 
calculated as 421.0753) in a negative ion mode. The time of flight (ToF) tandem 
mass spectrum of the parent ion 421.00 was then obtained in a negative ion mode 
for the m/z range from 50 to 500 using optimized collision energy of 20 eV. For 
the Quattro micro API MS, the electrospray source was operated with a capillary 
voltage of 3.0kV and a cone voltage of 40 V. The quadrupole tandem mass spectra 
of the parent ion 421.0 of T6P and S6P were obtained in a negative ion-mode for 
the m/z range from 50 to 500 using collision energy of 20 eV. With the reference 
to standard fragmentation patterns, tracking of the fragment ions of T6P in 
the plant sample was also performed by quadrupole tandem mass spectra. The five 
most intense m/z peaks recorded in the MS/MS spectrum of T6P; m/z 78.3, 96.4, 
138.6, 240.9 and 421.0 (unfragmented) were also selected for multiple reaction 
monitoring (MRM) and cross-referenced with selected ion recordings (SIR) for 
the intact molecular ion m/z 421.0. 

For seedling liquid culture, seeds of A. thaliana were grown in liquid culture as 
described previously”’. Once the seedlings were 7 days old, ONPE-T6P (compound 3) 
or DMNB-T6P (compound 2) were added to the growth medium to a final 
concentration of 1 mM. Plants were left under growth lights to take up the 
compounds for 24h. To facilitate precursor release, plants were placed under 
23,1mol m~* s~! UV for 2h, after which they were returned to previous envi- 
ronmental conditions. Samples were taken for analysis before addition of the 
compound, 1 day after addition, after 30, 60 and 120 min during UV treatment, and 
sampled again at 1 and 2 d after UV treatment. Samples were weighed, snap-frozen 
and stored at —80°C. 

For enzymatic sugar analysis, sugars were extracted from 5-10 mg of A. thaliana 
ground under liquid nitrogen, 1 ml of 80% was added and the sample was heated 
at 100°C for 1h, samples were centrifuged for 10 min at 13,000g to remove debris. 
The samples were added to assay buffer™’. Enzymatic reactions were performed 
as described previously” using hexokinase, glucose-6-phosphate dehydrogenase, 
phosphoglucose isomerase and invertase from Sigma-Aldrich (H4502, G8404, 
P5381 and 19274, respectively). Two technical replicates were carried out for each 
sample, a total of three biological replicates were analysed. See Supplementary 
Methods for further details. 

Extraction and measurement of starch in planta. Three or four chemical- and 
UV-light-treated plantlets were pooled and weighed (fresh weight, 70-100 mg) for 
each biological replicate. Extraction was based on literature methods”*. Samples 
were ground in liquid nitrogen to a fine powder in a mortar. The powder was 
rapidly extracted with 1 ml 80% ethanol at 80°C, followed by 2 x 0.5 ml to rinse, 
samples were then transferred to a 2-ml eppendorf at 100°C and heated for 
2-3 min until just boiling. Tubes were transferred to a water bath at 80°C while 
other samples were accumulated. Samples were centrifuged at 13,000g for 10 min 
to collect all solid material. The pellet was extracted twice more with 2 ml, 80% 
hot ethanol. The pellet was washed with 1 ml water, the supernatant removed 
and 100,11 water added. The pellet was homogenized to a smooth consistency 
with an Eppendorf micropestle before being made up to final volume of 50011 
with water. Samples were heated at 100°C for 10 min to gelatinize starch granules. 
Duplicate aliquots (10011) were removed and digested with a-amylase (2 U) and 
amyloglucosidase (6 U) in 0.05 M sodium acetate pH 4.8 for 4h at 37°C. Control 
digests without enzyme were also set up. Glucose released from digested starch 
was measured using an enzymatic assay coupled to the reduction of NADP to 
NADPH* and adapted for a microtitre plate reader. 10-20 il of digest was assayed 
in triplicate. Starch content is expressed as hexose equivalents per gram fresh 
weight. See Supplementary Methods for further details. 

SnRK1 activity. Kinase activities were determined by measuring the incorporation 
of radiolabelled phosphate into the AMARA peptide (AMARAASAAALARRR; 
Biomol International) substrate and were carried out as described previously*°. 
ADP-glucose pyrophosphorylase activity. Enzyme activity was measured as 
described previously**. 

Application of precursors to wheat. Spring wheat (7: aestivum Cadenza) seeds 
were sown in Rothamsted standard compost mix and grown in controlled 
environment conditions with a photoperiod of 16h light, 8h dark, day/night 
temperatures of 20°C/16°C, photon-flux density of 600 1mol m~? s~!, and ambient 
relative humidity. Once the plants had reached anthesis, solutions of ONPE-T6P 
(compound 3) and DMNB-T6P (compound 2) (0.1, 1 or 10mM) as well as control 
solutions, water, 1 mM T6P or 1 mM trehalose were made up in distilled water with 
0.1% TWEEN-20. At 5, 10, 15 and 20 days after anthesis, either the ears, or the 
whole plant were sprayed individually with 5 or 50 ml of precursor, respectively. 
Leaf samples were taken at 5, 10, 15, 20 and 25 days after anthesis for chlorophyll 
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content analysis, grain was harvested at maturity for analysis. Chlorophyll content 
of leaves was measured by methanol extraction and spectrophotometry”®. Starch 
content of grain was measured enzymatically*4and protein content was measured 
by Bradford’s assay*®. 

For the drought treatment, vegetative Cadenza wheat plants were grown in 
the same compost and environments as above. Once the plants had reached 
Feekes stage 4, water was withheld for 10 days. On day 9, 30-ml, 1 mM solutions 
of oNPE-T6P (compound 3) and DMNB-T6P (compound 2) were applied to all 
above-ground biomass, on day 10 the watering schedule was reinstated. Plants were 
harvested to measure biomass production every 5 days for 30 days after rewatering. 
Both experiments were completed in biological replicates of six. 

For quantification of T6P, trehalose and sucrose in wheat samples, the harvested 
wheat grains were weighed, snap-frozen and stored at —80°C. Wheat grain was 
ground to a fine powder in liquid nitrogen and the sugars were extracted by LLE for 
TOP, trehalose and sucrose analysis using the same LC-MS quantification method 
as for A. thaliana. 

For minimal spray application, spring wheat (T. aestivum Cadenza) seeds were 
sown in Rothamsted standard compost mix and grown in controlled environment 
conditions with a photoperiod of 16h light, 8h dark, day/night temperatures of 
20°C/16°C, photon-flux density of 600 jumol m~* s~! and ambient relative humidity. 
Once the plants had reached anthesis, solutions of compounds 2 or 3 (1mM and 
10mM) and a water control, were made up in distilled water with 0.1% TWEEN- 
20. At 10 days after anthesis, the top 20 cm of above ground biomass encompassing 
ears and flag leaves were sprayed individually with 25 ml of the compounds. Grain 
from individual ears was harvested at maturity for analysis. All wheat experiments 
were repeated twice, with 3 technical and 6 biological replicates completed at each 
stage of analysis. 

RNA extraction, cDNA synthesis and qRT-PCR. Total RNA was extracted from 
50mg snap-frozen leaf tissue from A. thaliana Columbia using the Ribopure Kit 
(Ambion) according to the manufacturer’s instructions. RNA was quantified using a 
Nanodrop spectrophotometer and integrity of RNA was visualized using denaturing 
agarose gel electrophoresis*”. DNA was removed using RQ1 RNase-free DNase 
(Promega). cDNA was synthesized using SuperScript III First-Strand Synthesis 
System (ThermoFisher Scientific) using 2 1g of total RNA and oligo-dT primers 
according to the manufacturer's instructions. Gene expression was quantified using 
SYBR Green chemistry on a Real-Time PCR system 7500 (Applied Biosystems). 
Total reaction size was 20 u, containing 10\11 SYBR Green Jumpstart Taq ReadyMix 
(Sigma Aldrich), 2,11 cDNA and 0.5 mM primers. PCR used an initial denaturation 
stage of 95°C for 2 min, followed by 40 cycles of 95°C for 15s, 60°C for 1 min. The 
specificity of products was confirmed by performing a temperature gradient analysis 
of products at temperatures ranging from 55°C to 95°C with 0.5 °C increments. 
Two technical replicates were completed for each sample, a total of three biological 
replicates were analysed. Relative quantification of gene expression was performed 
using the Livak method using ubiquitin-transferase family protein as the reference 
gene. Primers used for SnRK1 marker gene expression, and starch gene expression 
are listed in Supplementary Tables 19 and 20, respectively. 

Mass spectometry imaging methods. A. thaliana used in ToF-SIMS (time- 
of-flight secondary ion mass spectrometry) imaging were grown in Petri dishes on 
0.5x Murashige and Skoog medium with 0.8% agar for 10 d, with a photoperiod of 
16h light, 8h dark, day/night temperatures of 23 °C/18°C and photon-flux density 
of 250j1mol m~ s~’. Plants were then transferred to Petri dishes containing the 
same media supplemented with 1 mM of either compound 2 or 3 for 24h, during 


which they remained under the previously stated growth conditions. After 24h, 
the plants were exposed to UV light at 23,1mol m7 s~! for 2h to facilitate ToP 
release. Plants were left for 2h to recover, frozen and dehydrated in a vacuum 
chamber before mass spectometry imaging analysis. Reference materials were 
drop-dried on clean substrates. The ToOF-SIMS mass spectrometry imaging analysis 
was performed with the ToF-SIMS IV mass spectrometer (IONTOF, Muenster, 
Germany) from three leaves, a control, and one leaf each treated with compounds 
2 or 3. A pulsed 25-keV Bi;* primary ion source was used as the analysis beam 
(pulse width, 23 ns, mass resolution, (m/Am), 5,000). Mass spectra of the reference 
material were obtained in positive and negative ion mode at a primary ion dose 
of 1.1 x 10!! ions cm~?. The leaf ion images were also collected in both polarities 
with a dose of 5.4 x 10'° ions cm ~*. An electron flood gun was employed for charge 
compensation during the data acquisition. Mass spectrometry data was analysed in 
ION-TOF SurfaceLab 6.4 software and further processed in MATLAB and Origin 
Pro. Known melissic acid markers!” (m/z 435.4, C39Hs90, [M — OH] and 451.4 
C30Hs59O2, [M — H]~) were used. See Supplementary Methods for further details. 

MALDIMS imaging data were acquired using identically prepared leaf 
samples with a modified QSTAR XL Qq-ToF instrument (Sciex, Ontario, Canada) 
fitted with a Nd:YAG laser (Elforlight Ltd, Daventry, UK) operated at 1,000 kHz 
in positive ion mode with a fluence of approximately 205J m~? and a pixel size of 
200m x 200 1m. The QSTAR was operated in continuous raster sampling mode. 
The sample was affixed to a stainless steel target plate with double-sided tape and 
sprayed with CHCA (5mg ml“! CHCA in 80% methanol, 0.1% TFA) using auto- 
mated spray deposition (TM Sprayer, HTX Technologies, Carrboro, USA). Data 
were converted from the proprietary .wiff format into .mzML using AB MS Data 
Converter version 1.3 (Sciex). These mzML files were then converted to .imzML 
using imzMLConverter*® and processed in custom-made MATLAB software 
(version R2014b, Math Works Inc, USA). Images are created by summing across 
the full-width, half-maximum of the peak of interest to give the intensity within 
each corresponding pixel. See Supplementary Methods for further details. 
Statistical methods. ANOVA was applied to data to test for differences between 
treatments. A natural log transformation was used where necessary to ensure 
constant variance. The GENSTAT statistical system was used for this analysis. 
Data availability. Primary data for Figs 2, 4, 5 and 6 are provided as spreadsheets. 
All other data are available on request. 
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Extended Data Figure 1 | The central role of T6P in plants and design of 
a chemical strategy for the control of T6P production. a, Photosynthesis 
generates sucrose, which is translocated to growing regions of the plant. 
Inside the cell, a pool of core metabolites are substrates for biosynthetic 
processes that determine growth and productivity. T6P is synthesized 
from UDPG and G6P by trehalose 6-phosphate synthase (TPS) and 
therefore reflects the abundance of sucrose. It is broken down by trehalose 
phosphate phosphatase (TPP). Increasing T6P stimulates starch synthesis 
and inhibits SnRK1, a protein kinase central to energy conservation and 
survival during energy deprivation. Inhibition of SnRK1 by T6P thus 
diverts carbon skeleton consumption into biosynthetic processes. 

b, The trehalose biosynthetic pathway. c, T6P is plant-impermeable. 
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Plant-permeable variants allowed subsequent photo-activated release. 

d, Generalized mechanism of light-activated release of precursors. 

e, Release of T6P by light irradiation from signalling precursors 1-4 

in vitro. *'P nuclear magnetic resonance spectroscopy at different 

time points of light irradiation confirming the activation of signalling 
precursors (1-4) and release of T6P. Time points for compound 1: 0, 30, 
60, 150 and 360 min; for compound 2: 0, 60, 120, 300, 420 and 600 min; 
for compound 3: 0, 15, 30, 45 and 60 min; and for compound 4: 0, 60, 120, 
240, 360 and 420 min. f, g, 'H (f) and *'P (g) nuclear magnetic resonance 
spectra, after complete photolysis of the signalling precursor confirming 
the release of TOP. 
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Extended Data Figure 2 | Inhibition of SnRK1. Signalling precursors determined by the level of incorporation of phosphate into a peptide 
(1-4), T6P released from 1-4 (rl, r2, r3, r4) and T6P standard (T6P) substrate (min~! mg! protein). (Data are shown as mean +s.e.m.; n= 3). 
were tested for inhibition of SnRK1 activity. T6P (0.26 mM) inhibits The activities of assays treated with precursors or released T6P were not 
SnRK1 activity to approximately 36% of the original activity. Signalling- significantly different from their controls (P < 0.001, LSD) using one-way 
precursor compounds show no such inhibition, whereas UV-released ANOVA of data transformed with a natural log scale. 


compounds show the same inhibition as free T6P. SnRK1 activity was 
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Extended Data Figure 3 | In planta uptake analysis of signalling or [M—H}] , of pure signalling precursors 1-4. d, HPLC (left) and mass 
precursors 1-4. a, Schematic of protocol used for uptake analysis. spectometry (right) data, [M+ Na]* or [M—H]~, of plant samples after 
b, Calibration curves for oNB-T6P (1), DMNB-T6P (2), oNPE-T6P (3) treatment with signalling precursors 1-4. In 3, the partially uncaged 
and mono-DMNB-T6P (4), respectively. Data are shown as mean+s.e.m.; —_ molecule also accumulated and was detected (coloured light blue). 
(n=2). c, HPLC (left) and mass spectrometry (right) data, [M + Na]* 
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Extended Data Figure 4 | Extraction and quantification of T6P. 

a, Schematic of the protocol used for preparation of samples for T6P 
quantification. LLE, liquid/liquid extraction; SPE, solid phase extraction; 
AEC-MS, anion exchange chromatography—mass spectrometry. 

b, Liquid chromatograms of T6P, S6P and 2DG6P separation (top) 

using conditions optimized in Supplementary Table 8 (entry 7) and the 
representative LC-MS chromatograms of extraction samples treated with 
signalling precursors (middle) and water control (bottom). ¢, d, Liquid 
chromatograms of different concentrations of T6P (500, 250, 100, 50, 

25, 10 or 51M) using a constant concentration (100 1M) of 2DG6P as an 
internal standard. e, Resulting calibration curves of the T6P peak area and 
T6P/2DG6P ratio against T6P concentrations (in|1M) in water as well as in 
the plant matrix. f-h, LC-MS/MS analysis of T6P and S6P from samples 
of plant treated with compound 2. f, Fragmentation patterns of T6P (top) 
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and S6P (bottom) by quadrupole time-of-fight tandem mass spectrometry 
(QToF-MS/MS) in negative ion mode. g, Fragmentation patterns of T6P 
(top) and S6P (middle) by triple quadrupole tandem mass spectrometry 
(QqQ-MS/MS) in negative ion mode and the T6P fragment ions tracking 
in the plant matrix (bottom). h, HPLC chromatograms of T6P/S6P by 
selected ion recording (SIR) of the intact molecular ion (m/z 421.0) and 
multiple reaction monitoring (MRM) of the fragment ions give the same 
retention time for each compound. i, The LC-MS quantification method 
through SIR and LC-MS/MS quantification method through MRM of the 
TOP level in the DMNB-T6P 2-treated plant sample. From bottom to top: 
integration of the T6P trace (2,661) using SIR of m/z 421.0, integration of 
the T6P trace (2,550) using MRM of m/z 78.6, 96.3, 138.7, 241.0 and 421.0, 
integration for each fragment ion m/z 78.6 (801), m/z 96.3 (868), m/z 138.7 
(76), m/z 241.0 (404) and m/z 421.0 (392). 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Analysis of A. thaliana plantlets following 
treatment. a—c, Phenotype analysis. a, Fresh weight of plantlets versus 
concentration of signalling precursors of T6P (1-4) and G6P precursors 
(14-17) in medium after three days (72 h) of uptake. Data are shown as 
mean + s.e.m.; 1 =3. Each T6P precursor is shown (top) together with 
its G6P analogue (bottom). Visual appearance of a typical plantlet was 
analysed for a given concentration of precursors at the point of harvest. 
b, Phenotype of plants at the end of light treatments. Plants were allowed 
to take up compounds for 72 h and were then treated the next day with 
light treatments. Light treatments: GL, growth light irradiance 

250pmol m~*s~!. UV 8 W and UV 100 W were growth light irradiance 
supplemented with UV light (365 nm). Daylight, part sun/part 

cloud; irradiance between 250|1mol m~? s~! under cloud and 

1,440 pmol m~? s~! under full sun. Compounds were fed to the 

plants to a final concentration of 1 mM. Phenotype of plants fed with 
compound 3 at a reduced final concentration of 0.1 mM are shown in 
the right-hand panel. Scale, diameter of the plastic tube mouth = 10 mm. 
c, Typical A. thaliana phenotypes in the starch experiment. Plants were 
treated with a final medium concentration of 0.1 mM compound or water 
for 72h and then exposed to 8-h, 8 W UV treatment. The plants were 
allowed to recover for another 24h and were harvested at the end of the 
day and the starch content was measured. No significant phenotypic 
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differences were observed between treatments. Water (left), 

0.1 mM oNPE-T6P (compound 3) (middle), oNPE-G6P(1-OMe) 
(compound 16) (right). Scale, tube diameter = 10 mm. d-g, Biosynthetic 
effects of increasing T6P in planta. d, Starch level at the end of the day in 
UV-treated (8 W, 23 umol m~? s~!, 8h) plants fed with compound 3 + UV 
is significantly higher than plants treated with water + UV (n=9, data 
are shown as mean + s.e.m.). Samples for starch were taken 1 day after 
UV treatment. e, ADP-glucose pyrophosphorylase (AGPase) activity is 
increased in UV + compound 3 plants compared to compound 3 only, 
UV only, water only, UV + water-treated plants and plants treated with 
compound 16 (n= 3, data are shown as mean + s.e.m.). f, Starch synthesis 
rate in UV-treated (20|1mol m~? s~!, 8h) plants treated with compound 3 
(data are shown as mean + s.e.m., 1 = 3). g, Starch level at the beginning 
(data are shown as mean + s.e.m.; n= 3) and at the end (data are shown 
as mean +s.e.m.; n=4) of the day in UV (20;mol m~? s~', 8h) + water- 
treated (solid circles) and UV + compound 3-treated (empty circles) 
plants. A. thaliana used in e-g were grown at a light regime of 12h 
day-12h night, at 250,1mol m~ s~!, and 23°C day/18°C night 
temperatures, treated with compounds 18 d after sowing, and exposed 

to UV light 72h after addition of the compound. Asterisks in d-f denote 
significance according to ANOVA (P= 0.002). Asterisk in g denotes 
significance by one-way ANOVA (LSD 5% = 11.19). 
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Extended Data Figure 6 | Quantification of in planta metabolites. 

a, LC-MS chromatograms of trehalose, sucrose, glucose and fructose 
separation using a HILIC column, for details see Supplementary 
Information. b, Liquid chromatograms and peak areas of different 
concentrations of trehalose (100, 50, 25, 10 or 541M) and glucose 

(500, 250, 100, 50, 25, 10 or 541M). c, Calibration curves of the trehalose 
peak area against the concentrations (in 1M) and glucose peak area 
against the concentrations (in|1M). d, e, Same as Fig. 4, 7-day-old 

A. thaliana seedlings grown in liquid culture were treated with 1 mM of 


compounds 3 or 2, control seedlings were treated with water. Seedlings 
were left under growth lights to allow for uptake of the signalling 
precursors for 24h, before exposure to 23 ,1mol m~* s~! UV for 2h. 
Measurements were taken 1 day after uptake (pre-UV), 30, 60 and 120 min 
after initiation of UV treatment, and 1 and 2 d after initiation of UV 
treatment. See Fig. 4 for T6P content, trehalose content, sucrose content, 
glucose content; here fructose content (d)and Fresh weight biomass (e) 

are shown. *P < 0.05); **P < 0.01 (Student’s t-test). Data are shown as 
mean +s.e.m. (n= 3). 
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Extended Data Figure 7 | Dynamics of °C-T6P, '°C-trehalose, 
13C-glucose and T6P in A. Thaliana treated with C-labelled 
precursor 2*. 1 mM of DMNB-'3C-T6P (2*) was added to the growth 
medium of 7-day-old A. thaliana seedlings. The plants were left under 
growth light to allow uptake for 24h and the uncaging was performed 
under 23 j1mol m~* s~! UV for 2h. Samples were harvested for analysis 
at different time points: pre-UV, 30, 60 and 120 min (after onset of 


UV irradiation) and 1 and 2 days after UV irradiation. a, Synthesis of 
universally C-labelled 2* in essentially the same manner as for 2. 

b, Amount of '°C-T6P released over time in planta. c, Amount of 
3C-trehalose accumulated. d, Amount of '?C-glucose accumulated. 

e, Amount of endogenous TOP. f, Overview of '°C tracking of T6P and 
metabolites. Data are shown as mean +s.e.m.; n= 3; *P< 0.05; **P<0.01 
(Student's t-test). 
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Extended Data Figure 9 | Transcript abundance of genes involved in 
starch synthesis and SnRK1 marker genes in response to caged T6P 
precursor application of 7-day-old A. thaliana seedlings in liquid 
culture. Seedlings were treated with a final concentration of 1mM of 
compounds 2 or 3, uptake was allowed for 1 d under the growth lights, 
before treatment with 23 mol m~? s~! UV light for 2h to facilitate 
uncaging. a, b, Transcript fold change after 60 min of UV treatment (a) 
and 1 d after UV treatment (b) of SnRK1 marker genes. Marker genes 
normally downregulated by SnRK1: TPS5, UDPGDH (At3g29360) and 
bZIP11, and marker genes normally upregulated by SnRK1: TPS8, BGAL 
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and ASN1. ¢, Transcript fold change of starch synthesis genes after 60 min 
of UV treatment. Genes involved in starch synthesis: APL3, SS3, BE1 and 
GBSS1. d, Transcript fold change after 60 min of UV treatment for starch 
degradation genes. Genes involved in starch degradation: BAM1, BAM3, 
BAM4 and GWD3. Changes in transcripts for enzymes of degradation 
were more equivocal with GWD3 increasing and BAM genes showing 
small changes or decreasing (BAM3). All data were normalized to a 
ubiquitin control. Data are shown as mean + s.e.m. of three independent 
samples. 
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Extended Data Figure 10 | Additional effects in wheat. a, Chlorophyll 
content of leaves after anthesis of ear treatments. b, Chlorophyll content 
of leaves after anthesis of whole-plant treatments. c-e, T6P release and 
metabolism in wheat. Developing wheat grain were treated with T6P, 

or compounds 2 or 3 (all 1 mM) 5 or 10 days after anthesis (DPA) and 
harvested 1 day later. c, Amount of T6P in wheat grains (n = 3; data 

are shown as mean +s.e.m.). d, Trehalose (n = 3; data are shown as 
mean +s.e.m.). e, Sucrose (n = 3; data are shown as mean +$.e.m.). 


1mM DMNB- 10mM DMNB- 
TeP T6P TeP 


*P < 0.05; **P < 0.01 (Student's t-test). f, Dose response grain yield per 
plant to T6P precursors (0.1, 1 or 10 mM oNPE-T6P (3) or DMNB-T6P (2) 
and water, T6P and trehalose controls) sprayed to ears (5 ml) or to the 
whole plant (50 ml) at 5, 10, 15 and 20 days after anthesis. g, Grain yield 
per ear in response to a single time point spray (5 ml to the ear at 10 days 
after anthesis). *P < 0.05 (f, g) compared to water control (Student’s t-test). 
Data are shown as mean +s.e.m. (n=6). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature20602 


Integrin- YAP/TAZ-JNK cascade mediates 
atheroprotective effect of unidirectional shear flow 


Li Wang”, Jiang-Yun Luo!”, Bochuan Li*, Xiao Yu Tian”, Li-Jing Chen°, Yuhong Huang?”, Jian Liu’, Dan Deng’, Chi Wai Lau’, 
Song Wan*®, Ding Ai**, King-Lun Kingston Mak’, Ka Kui Tong’, Kin Ming Kwan’, Nanping Wang®, Jeng-Jiann Chiu®, Yi Zhu*4 & 


Yu Huang!? 


The Yorkie homologues YAP (Yes-associated protein) and TAZ 
(transcriptional coactivator with PDZ-binding motif, also known 
as WWTRI), effectors of the Hippo pathway, have been identified 
as mediators for mechanical stimuli!. However, the role of 
YAP/TAZ in haemodynamics-induced mechanotransduction and 
pathogenesis of atherosclerosis remains unclear. Here we show that 
endothelial YAP/TAZ activity is regulated by different patterns 
of blood flow, and YAP/TAZ inhibition suppresses inflammation 
and retards atherogenesis. Atheroprone-disturbed flow increases 
whereas atheroprotective unidirectional shear stress inhibits 
YAP/TAZ activity. Unidirectional shear stress activates integrin and 
promotes integrin-Go.; interaction, leading to RhoA inhibition 
and YAP phosphorylation and suppression. YAP/TAZ inhibition 
suppresses JNK signalling and downregulates pro-inflammatory 
genes expression, thereby reducing monocyte attachment and 
infiltration. In vivo endothelial-specific YAP overexpression 
exacerbates, while CRISPR/Cas9-mediated Yap knockdown in 
endothelium retards, plaque formation in ApoE~/~ mice. We also 
show several existing anti-atherosclerotic agents such as statins 
inhibit YAP/TAZ transactivation. On the other hand, simvastatin 
fails to suppress constitutively active YAP/TAZ-induced pro- 
inflammatory gene expression in endothelial cells, indicating that 
YAP/TAZ inhibition could contribute to the anti-inflammatory 
effect of simvastatin. Furthermore, activation of integrin by oral 
administration of MnCl, reduces plaque formation. Taken together, 
our results indicate that integrin-Go}3-RhoA-YAP pathway holds 
promise as a novel drug target against atherosclerosis. 

Endothelial cells (ECs) are constantly exposed to mechanical forces 
generated by blood flow. Different shear forces induce distinct cellular 
responses. Disturbed flow is associated with vascular inflammation and 
focal distribution of atherosclerotic lesions, while steady unidirectional 
shear stress (USS) is anti-inflammatory and atheroprotective”. 

The Hippo pathway, a newly identified kinase cascade, is involved 
in organ size control and tumour suppression. Activation of this 
pathway leads to inhibition of downstream effectors YAP/TAZ by 
promoting their phosphorylation and cytoplasmic retention*. YAP/ 
TAZ were reported as sensors for mechanical stimuli including matrix 
stiffness, stretch and cell density!. However, the role of YAP/TAZ in 
haemodynamics-mediated signal transduction and atherosclerosis is 
still unclear. 

Indirect evidence implies possible involvement of YAP/TAZ 
in atherogenesis. The well-characterized YAP/TAZ target genes 
(CTGF and CYR61) are highly expressed in human atherosclerotic 
lesions*. Lysophosphatidic acid, a major atherogenic factor, is the 
potent activator of YAP and TAZ). Statins, the widely used anti- 
atherosclerotic drugs, were identified as the strongest YAP inhibitors 


among 640 clinically used drugs®. However, direct evidence for YAP/ 
TAZ activation in atherogenesis is still lacking. 

We first found that mouse ECs express a higher level of YAP than 
other cells in aorta, indicating a possible role of YAP in maintaining 
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Figure 1 | Haemodynamics regulates YAP phosphorylation, subcellular 
localization, downstream gene expression and reporter gene activity in 
ECs. a, Immunoblotting showing YAP expression is higher in mouse aorta 
with endothelium (+ Endo) than that without endothelium (— Endo). 

b, USS promotes, while disturbed flow inhibits, YAP phosphorylation. 

c, USS promotes YAP nuclear exportation in HUVECs. YAP was visualized 
by immunostaining (green); nuclei were counterstained with propidium 
iodide (PI, red). d, e, USS suppresses while disturbed flow increases 

(d) 8x GTIIC-luc reporter gene activity and (e) expression of YAP/TAZ 
target genes CTGF and CYR61 (n= 3; compared with static (STA), 

*P < 0.05 by two-tailed unpaired t-test). f, Immunoblotting showing YAP 
phosphorylation level is higher in thoracic aorta (TA, straight) compared 
with aortic arch (AA, curved) from C57BL/6] mice. g, En face staining of 
YAP in mouse aorta showing increased YAP nuclear localization in inner 
curvature of the aortic arch compared with outer curvature and thoracic 
aorta (ra = 6, MAA, inner = 35 MAA, outer = 3). h, i, Immunostaining (i) of 
pYAP in rat abdominal aorta with surgical stenosis (h), showing increased 
pYAP in the clipped region and decreased pYAP in the downstream region 
(n= 3). Representative images of three separate experiments are shown. 
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endothelium homeostasis (Fig. la). To investigate the impact of 
haemodynamics on YAP activity, YAP phosphorylation (Ser127, 
pYAP) in human umbilical vein ECs (HUVECs) subjected to USS 
(12 dyn cm ~”) or disturbed flow (0.5 +6 dyn cm ~?, 1 Hz) was measured. 
Interestingly, USS inhibited, while disturbed flow activated, YAP activity. 
pYAP increased in HUVECs and human aortic ECs exposed to USS 
(Fig. 1b and Extended Data Fig. 1a). Accordingly, increased YAP/TAZ 
cytoplasmic retention was observed in HUVECs subjected to USS 
(Fig. 1c and Extended Data Fig. 1b, c). Congruently, USS suppressed 
transactivation activity of YAP/TAZ, indicated by reduced YAP/TAZ 
responsive luciferase (8 x GTIIC-luc) reporter gene activity and down- 
regulated expression of target genes (Fig. 1d, e). By contrast, disturbed 
flow reduced pYAP (Fig. 1b, and Extended Data Fig. 1d), enhanced YAP/ 
TAZ reporter gene activity (Fig. 1d) and increased YAP/TAZ target gene 
expression (Fig. le and Extended Data Fig. le-g). To investigate the 
effect of haemodynamics on YAP activity in vivo, we determined YAP 
phosphorylation and nuclear localization in segments of mouse aorta 
and showed that pYAP level in aortic arch, an area exposed to disturbed 
flow, was lower than in thoracic aorta, an area exposed to USS (Fig. 1f). 
Consistently, in outer curvature of aortic arch and thoracic aorta, 
where blood flow is unidirectional, YAP was predominantly localized 
in the cytoplasm, while in the inner curvature of aortic arch, where 
blood flow is disturbed, YAP was mainly localized in the nuclei (Fig. 1g 
and Extended Data Fig. 1h). Rat abdominal aorta cross-clamping is a 
model used to generate different flow patterns in vivo’ (Fig. 1h). The 
constricted region, where unidirectional flow is accelerated, exhibited 
highest pYAP levels. Modest pYAP levels were detected in the upstream 
region where blood flow is unidirectional, while low pYAP was observed 
in the downstream region where blood flow is disturbed (Fig. 1i). 
Integrin (3 is a direct sensor for shear forces. The putative integrin 
agonists RGD-containing peptide (GRGDSP) or MnCl, can mimic the 
effect of USS®. To determine whether USS induces YAP phosphorylation 
through activating integrin, we examined USS-induced YAP phospho- 
rylation in HUVECs transfected with loss-of-function mutation of inte- 
grin (with cytoplasmic domain deletion (3;Acyto))?. We found 8;Acyto 
overexpression abolished USS-induced YAP phosphorylation (Fig. 2a). 
Furthermore, treatment with GRGDSP or MnCl, increased pYAP in 
HUVECs (Fig. 2b and Extended Data Fig. 2a). In addition, GRGDSP 
suppressed YAP/TAZ target gene expression (Extended Data Fig. 2d). 
Congruently, MnCl, induced YAP/TAZ nuclear exportation (Extended 
Data Fig. 2b) and reduced YAP/TAZ reporter gene activity (Extended 
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(green) were visualized by immunostaining; nuclei 
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The representative images of at least three 
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Data Fig. 2c), whereas integrin 33 knockdown reversed MnCl,-induced 
YAP phosphorylation (Fig. 2c). This evidence suggests that integrin 
activation directly induces YAP phosphorylation. 

A previous study has suggested that flow-derived pulling force 
induces integrin activation by maintaining its extended conforma- 
tion (ligand binding conformation)’”. To test whether integrin {3 in 
the extended conformation promotes YAP/TAZ phosphorylation, a 
Leu33Pro point mutation of integrin 83; (Pro32Pro33 integrin) was 
constructed to mimic integrin (3 activation!!. Indeed, Pro32Pro33 
overexpression in HUVECs induced YAP phosphorylation (Fig. 2d), 
downregulated YAP/TAZ target gene expression (Extended Data Fig. 2e) 
and suppressed YAP/TAZ reporter gene activity (Extended Data 
Fig. 2f), indicating integrin 8; mediates USS-induced YAP inhibition. 

RhoA is one of the most important upstream activators of YAP/ 
TAZ?. Integrin engagement and USS suppress RhoA activity*!”. 
Therefore, we hypothesized that RhoA mediates integrin-induced YAP/ 
TAZ suppression. As expected, basal and USS- or MnCl,-induced YAP 
phosphorylation was reduced in HUVECs transfected with constitu- 
tively active RhoA (Q63L) (CA-RhoA) (Fig. 2e, f). 

G-protein subunit Ga;3 mediates integrin-induced RhoA 
suppression!*"!. Therefore, we investigated the effect of Gays 
knockdown in USS- or MnCl,-induced RhoA inhibition and 
YAP phosphorylation. Neither USS nor MnCl, induced YAP 
phosphorylation when Gay; was silenced (Fig. 2c, g). Consistently, 
Gay3 knockdown reduced MnCl,-induced YAP nuclear exportation 
(Extended Data Fig. 2g) and RhoA inhibition (Fig. 2h). Similarly, Goy3 
knockdown mitigated GRGDSP-induced suppression of YAP/TAZ 
target gene expression (Extended Data Fig. 2h). 

Physical interaction between integrin 33 and Ga,3 induces RhoA 
inhibition!*"*, To understand whether integrin 83 and Gay; interaction 
mediates YAP phosphorylation, two myristoylated cell-permeable short 
peptides, mSRI and mP6, which mimic the interaction domain of Gay3 
and integrin 33, respectively, were used to selectively block association 
between Gay; and integrin 3; (refs 13, 15). Similar to the effect of Gay3 
or integrin 83 knockdown, mSRI or mP6 pretreatment abolished 
MnCl,-induced suppression of YAP/TAZ reporter gene activity and 
YAP phosphorylation in HUVECs (Extended Data Fig. 2c, i). Likewise, 
overexpression of SRI (Ga3 blocking peptide) in HUVECs abolished 
USS-induced YAP phosphorylation (Fig. 2i). 

Since haemodynamics is closely associated with pathogenesis of 
atherosclerosis, we compared the expression of YAP, pYAP, TAZ, Gaj3 
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and integrin 33 in aortas of ApoE~/~ mice with or without Western- 
diet-induced atherosclerosis. Our results showed downregulation of 
pYAP and Gay;, and upregulation of TAZ, in aortas with atheroscle- 
rotic plaques (Fig. 2j). Consistent with a previous report'®, we found 
integrin 83 was highly expressed in mouse aortas with atherosclerosis, 
possibly because of compensatory response!”. Immunofluorescence 
also showed that YAP phosphorylation was reduced in lesion area of 
ApoE ‘~ mice and in human atherosclerotic aortas (Fig. 2k, 1). Taken 
together, our results reveal that integrin activation promotes integrin- 
Gay3 association, which leads to RhoA suppression and subsequent 
YAP phosphorylation. 

To explore the mechanism of YAP/TAZ activation in atherogenesis, 
we analysed messenger RNA (mRNA) profiles in HUVECs transfected 
with constitutively active YAP (S127A) and TAZ (S89A) (CA-YAP/ 
TAZ). Four hundred and sixteen differentially expressed genes were 
identified by RNA sequencing (RNA-seq) (P < 0.05 and fold change 
cut-off > 1.5). DAVID KEGG enrichment analysis'® revealed six 
enriched pathways (Fig. 3a), including ‘leukocyte transendothelial 
migration, ‘“ECM-receptor interaction and ‘cell adhesion molecules, 
etc. Gene Ontology enrichment for biological process analysed by 
GlueGo”” indicated YAP/TAZ is associated with regulation of leukocyte 
migration (Fig. 3b). Indeed, we observed more monocyte-endothelial 
adhesion associated with YAP/TAZ activation in HUVECs (Fig. 3d and 
Extended Data Fig. 3c). Moreover, several pro-inflammatory markers, 
such as IL6, IL8 and SELE, were induced by YAP/TAZ activation 
(Fig. 3c and Extended Data Fig. 3a). Promoter reporter assay showed that 
CA-YAP/TAZ induced expression of adhesion molecules by enhancing 
their transcription (Extended Data Fig. 3b). However, deletion of the 
predicted TEAD binding sites, the known consensus DNA sequence for 
YAP-TEAD binding”®, in CXCL1 and SELE promoters failed to reverse 
YAP/TAZ-induced reporter gene activity (data not shown), indicating 
other regulatory mechanisms might be involved. These results suggest 
that endothelial YAP/TAZ activation participates in the initiation of 
atherosclerosis by promoting monocyte adhesion. 

JNK is critical in atherogenesis”’. USS inhibits tumour-necrosis 
factor-a-induced JNK activation, while prolonged disturbed flow acti- 
vates JNK?*”3, Our results showed that both USS and disturbed flow 
transiently increased phospho-JNK. However, in contrast to sustained 
JNK phosphorylation in HUVECs exposed to disturbed flow, pro- 
longed USS suppressed JNK phosphorylation (Extended Data Fig. 
3d, e). JNK effector activator protein (AP)-1 activity is reportedly 
increased by YAP/TAZ through JNK-YAP interaction**”°. We there- 
fore hypothesized that YAP/TAZ promotes endothelial activation 
through enhancing JNK activity. Indeed, JNK inhibitor SP600125 
suppressed YAP/TAZ-induced pro-inflammatory gene expression 
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Figure 3 | YAP/TAZ activation induces adhesion 


ee molecule expression through increasing JNK 
~ activity. a, KEGG enrichment pathway analysis 
and (b) Gene Ontology (GO) enrichment analysis 
for mRNA profile in HUVECs transfected 
— — with CA-YAP/TAZ. c, JNK inhibitor SP600125 
SCR YAP/TAZ KD 


suppresses CA- YAP/TAZ-induced inflammatory 
gene expression in HUVECs (n= 3; *P < 0.05 

by two-tailed unpaired t-test). d, CA-YAP/TAZ 
promotes monocyte attachment to HUVECs. 

e, f, YAP/TAZ knockdown reduces expression of 
(e) JNK target genes (IL6 and IL8) and (f) AP-1 
reporter gene activity induced by PMA (n=3; 


i *P < 0.05 by two-tailed unpaired t-test). 
z g, h, EC-specific YAP overexpression promotes 
a “ (g) atherosclerotic plaque formation, visualized 
ContApoE~ EOYaPApoE- by Oil Red O staining and (h) JNK activation, 
DAP! ~—spUNK Merge _—- detected by immunostaining of pJNK (red) 


i (n=5, representative result is shown). 
(Fig. 3c). On the other hand, YAP/TAZ knockdown reduced basal and 
phorbol ester (PMA)-induced phospho-JNK, expression of JNK target 
genes IL6 and IL8 as well as AP-1 reporter gene activity”’ (Fig. 3e, f 
and Extended Data Fig. 3f). Dominant-negative YAP (YAP S94A) 
suppressed PMA-induced AP-1 reporter gene activity, whereas 
CA-YAP/TAZ enhanced AP-1 reporter gene activity (Extended Data 
Fig. 3g, h). To assess whether YAP activates JNK and accelerates 
atherosclerotic plaque formation in vivo, we generated EC-specific 
YAP overexpression mice on ApoE ‘~ background (Tie2“*; Yap- 
COE'’*;ApoE~/~ (EC- Yap;ApoE~/~)) (Extended Data Fig. 4a, b). After 
4 weeks of feeding on Western diet, EC-Yap;ApoE ~~ mice showed 
significantly increased plaque formation (Fig. 3g), accompanied by 
increased expression of p-JNK and macrophage marker Mac3 com- 
pared with control littermates (Cont;ApoE~/ ~) (Fig. 3h and Extended 
Data Fig. 4c, d). Similar total cholesterol and triglyceride levels 
suggested that the atherogenic effect of endothelial YAP is unlikely to 
be related to lipid metabolism (Extended Data Fig. 4e, f). 

To demonstrate that disturbed flow-associated atherosclerosis 
is mediated by endothelial YAP activation in vivo, ApoE ‘~ mice 
received partial ligation surgery on the left carotid artery to develop 
disturbed flow-enhanced atherosclerosis. EC-specific Yap knockdown 
was achieved by using EC-enhanced AAV-mediated CRISPR/Cas9 
(ref. 28) genome-editing system controlled by EC-specific ICAM2 
promoter. Immunohistochemistry and western blotting showed 
efficient Yap knockdown in ECs (Fig. 4a, b). Three weeks after surgery, 
severe plaques developed in control ApoE~/~ mice. However, mice 
with EC-specific Yap knockdown exhibited reduced plaque formation 
(Fig. 4c). Mice injected with adenovirus-mediated Taz short hairpin 
RNA (shRNA) also showed delayed atherogenesis (Extended Data 
Fig. 5a—c). Furthermore, oral administration of MnCl, reduced plaque 
formation in ApoE~’~ mice on Western diet for 12 weeks (Fig. 4d), 
without affecting lipid profile or superoxide dismutase activity 
(Extended Data Fig. 5h, i). Conversely, plaque formation increased 
in mice injected with AAV expressing CA-YAP/TAZ (Extended Data 
Fig. 5d-f). In summary, both gain- and loss-of-function experiments 
in vivo show the importance of YAP/TAZ activation in atherogenesis. 

To examine whether existing anti-atherosclerotic drugs inhibit 
YAP/TAZ activity, we tested several agents (Extended Data Table 1). 
In addition to statins, which inhibit YAP/TAZ in tumour cells®, apelin, 
ApoA1 and niacin also suppressed YAP/TAZ activity (Fig. 4e). To 
understand whether YAP/TAZ suppression contributes to the anti- 
inflammatory effect of statins, we transfected HUVECs with CA-YAP/ 
TAZ. Compared with HUVECs transfected with vector control, sim- 


vastatin failed to suppress expression of pro-inflammatory genes 
induced by CA-YAP/TAZ, suggesting YAP/TAZ inhibition might be 
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Figure 4 | Suppression of YAP/TAZ activity retards atherogenesis. 

a, b, AAV-mediated CRISPR/Cas9 system specifically knocks down YAP 
level in endothelium of ApoE~/~ mice. Illustration (a, left) showing carotid 
partial ligation surgery in ApoE~/~ mice. YAP knockdown was confirmed by 
(a, right) immunostaining (YAP (green), Vcam1 (red), nuclei (blue)) (n=5, 
representative result is shown) and (b) immunoblotting of YAP in aorta. 

c, EC-specific YAP knockdown reduces plaque formation in ApoE~/~ mice 
receiving carotid partial ligation (arrow) surgery. d, Oral administration 

of MnCl, decreases atherosclerotic plaque formation visualized by Oil Red 
O staining. e, YAP/TAZ reporter gene activity assay of anti-atherosclerotic 
agents showing statins produce the strongest inhibitory effect on YAP/ 
TAZ activity (n= 3; *P <0.05 by two-tailed unpaired t-test). f, Simvastatin 
suppresses expression of YAP/TAZ target genes while failing to reverse 
CA-YAP/TAZ-induced expression of pro-inflammatory genes (n= 3; 

*P < 0.05 by two-tailed unpaired t-test). NS, not significant. g, Illustration 
of the haemodynamics-regulated YAP/TAZ signalling in ECs. 


involved in anti-inflammatory and anti-atherogenic effect of statins 
(Fig. 4f). 

In summary, this study provides novel evidence showing that 
endothelial YAP/TAZ activation induced by atheroprone-disturbed 
flow promotes inflammation and atherogenesis by enhancing JNK 
activity, whereas the atheroprotective USS inhibits YAP/TAZ by 
modulating the integrin-Ga)3-RhoA pathway (Fig. 4g). Endothelial 
YAP/TAZ knockdown or MnCl, treatment delays atherogenesis, 
indicating YAP/TAZ could become a potential therapeutic target 
against atherosclerosis, as demonstrated by the YAP/TAZ-inhibitory 
effect of several anti-atherosclerotic drugs, especially statins. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Antibodies. The antibodies used for western blotting included anti-YAP/ 
TAZ (1:1,000; 8418; Cell Signaling Technology, USA), anti- YAP (1:1,000; Cell 
Signaling Technology, USA), anti-pYAP (1:1,000; Ser 127, 4911S; Cell Signaling 
Technology, USA), anti-TAZ (1:1,000; ab84927; Abcam, UK), anti-JNK (1:1,000; 
9252h; Cell Signaling Technology, USA), anti-pJNK (1:1,000; 9255; Cell Signaling 
Technology, USA), anti-CTGF (1:1,000; ab6992; Abcam, UK), anti-Gays (1:1,000; 
ab128900; Abcam, UK), anti-integrin 33 (1:1,000; 4702; Cell Signaling Technology, 
USA), anti-RhoA (1:1,000; ab54835; Abcam, UK) and anti-eNOS (1:1,000; 
BD Biosciences, USA). 

The antibodies used for immunostaining included anti-pYAP (1:100; Ser 
127, 4911S; Cell Signaling Technology, USA), anti-YAP (1:100; Cell Signaling 
Technology, USA) and anti-pJNK (1:100; 9255; Cell Signaling Technology, USA). 
Quantitative real-time PCR. RNA was extracted by using TRIzol Reagent 
(Thermo) according to the manufacturer's protocol. cDNA was synthesized using 
a High-Capacity cDNA Reverse Transcription Kit (Thermo). Quantitative PCR 
was performed using SYBR Select (Thermo) following the manufacturer's protocol. 
GAPDH was used as the internal control. Primers used for quantitative real-time 
PCR were included in Supplementary Table 1. 

Western blotting. Cells or tissues were homogenized in cold RIPA lysis buffer 
supplemented with cOmplete Protease Inhibitors cocktail and phosSTOP 
phosphatase inhibitor (Roche). The protein concentration was determined using 
Bradford Assay (Bio-Rad). Ten micrograms of protein were resolved by SDS- 
polyacrylamide gel electrophoresis and transferred to the PVDF membrane 
(Bio-Rad). Target protein was detected using specific primary antibody. Bound 
antibodies were detected by horseradish-peroxidase-conjugated secondary 
antibody and visualized by enhanced chemiluminescence (Cell Signaling 
Technology). Experiments were repeated three times and the target protein 
level was quantified by ImageJ and normalized to internal control (or pYAP was 
normalized to total YAP) (Extended Data Figs 6 and 7). Original western blot scans 
are included in Supplementary Fig. 1. 

Cell culture. HUVECs and human aortic ECs were purchased from Lonza (EGM, 
Clonetics, Lonza, Walkersville, Maryland, USA). Lonza guarantees that the cells 
express CD31/105, von Williebrand Factor VIII, and are positive for acetyated 
low-density lipoprotein uptake. We did not test for mycoplasma contamination 
during the experiments. HUVECs were maintained in EGM supplemented with 
EGS and FBS at 37°C in an incubator with 95% humidified air and 5% CO, and 
passaged every 3 days. Cells within seven passages were used for the in vitro study. 
GST-RBD pull-down for active RhoA detection. GST-RBD recombinant 
protein was purified from BL21 (DE3) Escherichia coli and affinity conjugated to 
glutathione sepharose beads (Pharmacia). For GST affinity pull-down, 107 cells 
were lysed in 1 ml Weak Lysis Buffer (Beyotime) supplemented with protease 
inhibitors (Roche). Cell lysates were centrifuged at 15,000 g at 4°C for 20 min to 
remove cell debris. Cell lysates were incubated in sepharose beads conjugated 
with 1 1g GST-RBD and incubated at 4°C for 2h with constant agitation, and 
precipitated by centrifugation at 1,000 r.p.m. for 10 min. After three washes, beads 
were collected by centrifugation and boiled in 2 SDS loading buffer for 5 min. 
The active RhoA was determined by western blotting. 

Experimental animals. Animals were supplied by the University Laboratory 
Animal Services Centre and their use approved by the Ethical Committee of 
Animal Research (CUHK). The animals used in the present study included 
Sprague-Dawley rats, apolipoprotein E deficient (ApoE~/~) mice and EC-specific 
YAP overexpression transgenic mice. Male mice or rats were used in all in vivo 
studies. The animals were kept at a constant temperature (21 + 1°C) under 12/12-h 
light/dark cycle and had free access to water and standard chow unless specified. 

Construction of EC-specific YAP overexpression mice. CAG loxp-stop-loxp- 
Yap mice were generated in a C57BL/6 background in Model Animal Research 
Center (Nanjing, China). Yap-COE mice were crossed with ApoE~/~ mice and 
then Tie-2-Cre*/~ mice. The 6-week-old ApoE /~; Yap-COE;Tie-2-Cre*/~ and 
ApoE~'~;Yap-COE;Tie-2-Cre*!~ mice were bred and housed in temperature- 
controlled cages under a 12/12-h light/dark cycle with free access to water in 
Tianjin Medical University Animal Center. Study protocols and the use of animals 
were approved by the Institutional Animal Care and Use Committee of Tianjin 
Medical University (Tianjin, China). The mice were fed a Western diet (Research 
Diets, D12109) containing 40 kcal% fat, 1.25% cholesterol and 0.5% cholic acid for 
4 weeks before being killed. Aortas were isolated to assess lesion formation and 
distribution by Oil Red O staining. Aortic roots were stained for pJNK, a-SMA 
and macrophages. 

En face staining. Mouse aortas were fixed with 4% paraformaldehyde for 15 min. 
After permeabilization/blocking in 0.05% Triton X-100 (in PBS) and 1% BSA and 
for 0.5 h at room temperature, aortas were incubated at 4°C overnight in incubation 
buffer containing 1% BSA and the primary antibody including YAP1 (Abcam, 
ab52771), CD31 (Abcam, ab24590). After being washed in PBS three times, aortas 
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were incubated with Alexa-Fluor 488-, Alexa-Fluor 594-conjugated secondary 
antibodies (ZSGB-BIO, Beijing) for 1h at room temperature. The fluorescent signal 
was detected by a Leica confocal laser scanning microscopy. 

Disturbed flow in vivo. Stenosis of the abdominal aorta of rats was induced 
using a U-shaped titanium clip, as described””°. Briefly, after anaesthetization 
with isoflurane, the rat was laid supine and a lower midline abdomen incision 
was made; the part of the intestine was gently lifted out of the abdominal cavity 
and kept moist with saline throughout the surgical procedure. The aorta, left and 
right common iliac artery were exposed and the accompanying vein was carefully 
separated. The clip was held with a pair of forceps and placed around the isolated 
segment (1 cm from the arterial bifurcation) to partly constrict the abdominal 
aorta. The extent of clipping was controlled by placing a stopper of given size 
between the two arms of the forceps. Two weeks later, the rat was euthanized by 
intoxication with 100% carbon dioxide, and the aorta was perfusion-fixed with 
4% (w/v) paraformaldehyde at 120 mm Hg. The fixed aorta was embedded in 
paraffin blocks for immunohistochemical staining. 

Partial ligation of carotid artery was generated as described before*!. Briefly, 
ApoE~‘~ mice were anaesthetized by intraperitoneal injection of xylazine (10 mg/kg) 
and ketamine (80 mg/kg) mixture. A ventral midline incision (4-5 mm) was made 
in the neck. Left carotid artery was exposed by ventral midline incision (4-5 mm) 
in the neck. Left external carotid, internal carotid and occipital arteries were 
ligated, while the superior thyroid artery was left intact. Mice were monitored 
until recovery in a chamber on a heating pad after surgery and fed the Western 
diet immediately after surgery until killed. 

Immunohistochemical staining. Immunohistochemical staining was performed 
on serial sections (5 1m thick) of paraffin-embedded rat abdominal aortas and 
ApoE /~ mouse aortas using pYAP (Cell Signaling), EC- and SMC-specific markers 
(that is, vWF and a-SMA, respectively) (Merck Millipore). Briefly, the sections 
were de-waxed in xylene, rehydrated in descending grades of alcohol and perme- 
abilized by incubating for 10 min in sodium citrate for 10 min at 95°C. Sections 
were cooled down to room temperature and blocked with blocking reagent (Merck 
Millipore) for 30 min. One section was incubated with antibody against pYAP 
(1:100) overnight at 4°C, followed by Alexa-Fluor 594-conjugated goat-anti-rabbit 
IgG (1:1,000; Invitrogen) secondary antibody in blocking reagent for 1h at room 
temperature. The secondary section was incubated with antibodies against VWF 
and a-SMA (1:100 each) overnight at 4°C, followed by Alexa-Fluor 594-conju- 
gated goat-anti-rabbit IgG and Alexa-Fluor 488-conjugated goat-anti-mouse IgG 
(1:1,000; Invitrogen) secondary antibodies in blocking reagent for 1h at room 
temperature. Nuclei were co-stained by DAPI (Invitrogen) in PBS for 5 min. The 
sections were spin-dried and mounted with ProLong Gold (Invitrogen) on glass 
coverslips. Images were acquired and analysed using a Zeiss fluorescence micro- 
scope with Axiovision image analysis software. 

Oral administration of MnCl, in ApoE~/~ mice. ApoE~/~ mice (male, 12 weeks 
old) were fed a Western diet, and MnCl, was administered through voluntary 
water consumption. Water consumption rate was predetermined by monitoring the 
volume of water remained. MnCl; was supplemented to drinking water to achieve 
5 mg/kg body weight. Mice body weight and water consumption were adjusted 
weekly to adapt to the change of body weight and water consumption. After feeding 
on the Western diet for 3 months, the mice were killed and the atherosclerotic 
plaque formation was determined by Oil Red O staining. 

Oil Red O staining for atherosclerotic plaques in mouse aorta. The ApoE ’~ 
mice were killed by CO, asphyxiation. Mouse aortas were dissected in cold PBS and 
cut open to expose the atherosclerotic plaques. After fixation in 4% formaldehyde 
for 16h at 4°C, the tissues were first rinsed in water for 10 min and then in 60% 
isopropanol. The aortas were stained with Oil Red O for 15 min with gentle 
shaking, and rinsed again in 60% isopropanol and then in water for three rinses. 
The samples were fixed on the cover slides with the endothelial surface facing 
upwards. The images were recorded using an HP Scanjet G4050. The plaque areas 
were determined using National Institutes of Health ImageJ software and calculated 
by expressing the plaque area relative to the total vascular area. 

Human aortic specimens. The experiments were approved by the Hospital Human 
Subjects Review Committee (IRB approval number TSGHIRB 2-103-05-132) of 
Tri-Service General Hospital in Taipei and were conducted under the guidelines 
established by the Ethics Review Board of National Health Research Institutes, 
Taiwan. Written informed consent was obtained from all individuals. Human 
aortic tissue specimens were from patients with acute type-A aortic dissection. 
These samples were collected during emergency aortic surgery. The diseased 
segments of aorta (that is, dissecting aortic aneurysm) in these patients were all 
resected and replaced by an artificial inter-position graft. Specimens were fixed 
in paraformaldehyde, paraffin-embedded and cut into 51m sections. YAP Ser127 
phosphorylation was determined by immunofluorescence imaging. 
RNA-sequencing. HUVECs were transfected with pWCXIH-Flag-YAP-S127A 
(a gift from K. Guan, Addgene 33092) and 3x Flag pCMV5-TOPO TAZ (S89A) 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


(a gift from J. Wrana, Addgene 24815) or pEGFP-N1 by Neon transfection system 
(Invitrogen, USA)**%9, Four hours after transfection, cells were harvested and 
RNA was extracted using RNeasy Mini Kit (Qiagen, Germany). The extracted 
RNA samples were sent to Beijing Genomics Institute (BGI) for RNA-sequencing 
analysis. P< 0.05 and fold change >1.5 was used as a threshold for different 
regulated genes. DAVID tools were used for the pathways enrichment analysis 
and GlueGo was used for the Gene Ontology analysis. 

Haemodynamics study in vitro. Ibidi flow system (IBIDI, Germany) was used 
to generate USS and disturbed flow (12 dyn cm? for USS and 0.56 dyn cm~’, 
1 Hz for disturbed flow). \1-slide I 0.4 Luers (IBIDI, LLC) was used for immu- 
nofluorescence studies. The slide was coated with 50|1g/ml fibronectin for 24h. 
Seven thousand HUVECs were seeded onto the slide. After cells were adapted to 
medium containing 2% FBS (10% fatty acid free BSA for disturbed flow) for 6h, 
the slides were mounted onto the Ibidi flow system. For immunostaining of USS- 
induced YAP/TAZ nuclear exportation, cells were subjected to USS for 6h. For 
western blotting and reverse transcription real-time PCR analysis, the 1-slides were 
replaced with a custom-built flow chamber, which could accommodate more cells. 
Glass slides (75 mm x 38 mm; Corning) were coated with fibronectin (50 1g/ml). 
HUVECs were seeded on slides and allowed to attach on the bottom for 16h. For 
USS, the medium was replaced with EGM supplemented with 2% FBS for 6h. For 
disturbed flow, cells were incubated in EGM supplemented with 10% fatty-acid- 
free BSA (Sigma). The slides were mounted onto the flow chamber and connected 
to the Ibidi flow system. The cells were then subjected to USS or disturbed flow. For 
USS-induced YAP phosphorylation, 15 min of shear force was applied unless other- 
wise noted. For USS-induced YAP translocation, 6h of shear force was applied. 
For reverse transcription real-time PCR analysis, 4h of shear stress was sufficient 
to inhibit the expression of YAP/TAZ target genes. For reporter gene assay, 48h of 
shear forces were applied to HUVECs. 

Plasmid construction. To construct the reporter plasmids for adhesion molecules, 
human genomic DNA was purified from HUVECs using a Universal Genomic 
DNA Extraction Kit Ver 3.0 (Takara, Japan). The promoters of ICAM1, SELE, CCL2 
and CXCL1 were PCR amplified from human genomic DNA using the primers 
listed in Supplementary Table 1. A 2.1 kb fragment (—1784 to +328) from the 
ICAMI1 promoter, a 2.2 kb fragment (— 1807 to +475) from the SELE promoter, a 
4kb fragment (—3992 to +73) from CCL2 promoter and a 1.3 kb fragment (—1256 
to +84) from CXCL1 promoter were amplified. The PCR products were gel purified 
by gel extraction kit (Takara, Japan) and digested with restriction enzymes. The 
digested fragments were gel purified and ligated to pGL3 reporter plasmid digested 
by corresponding restriction enzymes. The ligation products were then heat inacti- 
vated at 65°C for 15 min and transformed into the DH5a competent cells. 

The Pro32Pro33 integrin was derived from pcDNA3.1-beta-3 (a gift from 
T. Springer, Addgene plasmid 27289) by point mutation**. 

Primers used for plasmids construction were included in Supplementary 

Table 1. 
Adenovirus production. To generate the adenovirus shuttle vector pShuttle-U6, 
the U6 promoter and 1.9kb stuffer sequence was excised from pLKO.1 (a gift from 
D. Root, Addgene plasmid 10878) with NotI/Xhol and ligated into pShuttle plasmid 
pre-digested with restriction enzymes accordingly. Short hairpin RNA targeting 
mouse Taz was generated using a protocol similar to pLKO.1 shRNA plasmids 
(Addgene) construction protocol. Taz shRNA sequence, TRCN0000095951, which 
was validated by Mission shRNA (Sigma Aldrich), was used to generate shuttle 
plasmids for Taz shRNA. 

Recombinant adenovirus was generated using the AdEasy system”. Briefly, 

pShuttle-U6 vector containing shRNA was digested with Pmel and co-transformed 
with adenoviral backbone plasmid pAdEasy-1 for homologous recombination 
in E. coli BJ5183 cells. Positive recombinants were linearized by Pacl digestion 
and transfected into HEK-293A cells for virus packaging. The medium and cells 
were collected until the cytopathic effect was apparent. After three cycles of freeze 
and thaw to release the virus, the cell debris was removed by centrifugation at 
3,000 r.p.m. for 15 min. The virus-containing supernatant was collected by PEG 
precipitation, followed by dialysis against saline with 100K MWCO dialysis tubing 
(Spectrum Labs). 
Lentivirus production. Lentiviral shuttle plasmids for YAP (TRCN0000300325), 
TAZ (TRCN0000370007), Gay3 (TRCN0000036885) and ITGB3 
(TRCN0000003236) shRNA were purchased from Sigma. Plasmid cocktail 
containing 1 j.g of resultant shuttle plasmid, 750 ng of psPAX2 packaging plasmid 
and 250 ng of pMD2.G envelope plasmid were co-transfected to HEK-293FT cells. 
The medium was changed 15h after transfection; 48 and 72h after transfection, 
the medium containing the lentiviral particles was harvested then passed through 
0.45 1m filters to remove cell debris. The virus was precipitated with PEG and 
suspended in PBS containing 4% sucrose. The lentiviral solutions were then 
aliquoted to vials and stored at —80°C. 


Construction of AAV shuttling plasmid for CA-YAP/TAZ overexpression. 
YAP1 S127A was amplified from pWCXIH-Flag-YAP-S127A (a gift from 
K. Guan, Addgene 33092) and ligated to pAAV-MCS (Stratagene) to generate the 
pAAV-YAP1 S127A shuttle plasmid. A similar strategy was used to generate the 
pAAV-TAZ S89A from 3 x Flag pCMV5-TOPO TAZ (S89A) (a gift from J. Wrana, 
Addgene 24815). 

Construction of endothelial specific AAV-mediated CRISPR/Cas9 shuttle 
plasmid. pX601-AAV-CMV: NLS-SaCas9-NLS-3xHA-bGHpA;U6::Bsal- 
sgRNA (a gift from F. Zhang, Addgene plasmid 61591) was used to generate the 
EC-specific Cas9 for Yap in vivo genome editing”*. Three sgRNA sequences for 
Yap were predicted by CCTop (CRISPR/Cas9 target online predictor)**. ICAM2 
endothelium-specific promoter from human was synthesized by GenScript and 
replaced the CMV promoter in pX601-AAV-CMV4. 

Primers used for sgRNA were included in Supplementary Table 1. 

Endothelial enhanced AAV packaging. The shuttle plasmids were co-transfected 
into HEK-293T with endothelial enhanced RGDLRVS-AAV9-cap plasmid 
(provided by O. J. Miiller, Universitat Heidelberg, Germany) and pHelper 
plamid (Stratagene)*”. After co-transfection for 72h, the AAV viral particles 
were isolated according to the protocol reported in ref. 38. Briefly, the cells were 
harvested and re-suspended in 1 x restore buffer and the nuclei were extracted by 
homogenization. Viral particles were extracted by using nuclear lysis buffer. The 
viral particles were purified by PEG concentration, followed by dialysis against 
saline with 100K MWCO dialysis tubing (Spectrum Labs) to remove impurities, 
and concentrated. The viral titration was determined by qPCR and adjusted to 10!° 
plaque-forming units per ml in PBS containing 4% sucrose. 
Virus administration. For adenovirus-mediated Taz shRNA, viruses (10° 
plaque-forming units) were administered to ApoE /~ mice (male, 12 weeks old) 
that had been fed on Western diet (Research Diets) for 4 weeks, through tail vein 
injection. The mice were then fed on Western diet for 2 more months. The ather- 
osclerotic plaque formation was visualized by Oil Red O staining. 

For AAV-mediated CA-YAP/TAZ overexpression and YAP-Cas9, the viruses 
(10° plaque-forming units) were administrated to ApoE~/~ mice (male, 12 weeks 
old) through tail vein injection before feeding on Western diet or receiving the 
carotid partial ligation surgery. 

Statistical analysis. Statistics analyses were performed using GraphPad Prism 
5.0. The sample sizes were not predetermined by statistical methods. The samples 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. At least three independent experiments 
were performed for all biochemical experiments and the representative images 
were shown. Results represent mean + s.e.m. Student's t-test (unpaired two-tailed) 
was used in the analysis. No samples, mice or data points were excluded from 
the reported analysis. Levels of probabilities less than 0.05 were regarded as 
significant. 

Data availability. The RNA-seq data that support the findings of this study have 
been deposited in BioSamples database (https://www.ebi.ac.uk/biosamples/) 
under accession number SAMN04565728. All other data are available from the 
corresponding authors upon reasonable request. 
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Extended Data Figure 1 | USS and disturbed flow oppositely regulate 
YAP/TAZ activity. a, Immunoblotting showing USS induces YAP 
phosphorylation in human aortic ECs. b, Summarized data for USS- 
induced YAP nuclear exportation (n =5; *P < 0.05 by two-tailed 
unpaired t-test). c, TAZ is decreased in nuclear fractions and increased 
in cytoplasmic fractions in HUVECs exposed to USS for 6h. TAZ 
expression was detected by immunoblotting after cell fractionation. 


d, Disturbed flow suppresses YAP phosphorylation in human aortic ECs. 


Outer 


Inner 
AA 


e, Immunoblotting showing disturbed flow increases CTGF expression 
in HUVECs. All immunoblotting experiments were repeated three times 
and the representative results are shown. f, g, YAP/TAZ knockdown 
attenuates gene expression of disturbed-flow-induced (f) CTGF and 

(g) CYR61 (n=3; *P < 0.05 by two-tailed unpaired t-test). NS, not 
significant. h, Summarized data for en face staining of relative nuclear 
YAP level in mouse aorta (1ra = 6, MAA, inner = 35 MAA, outer = 33 *P < 0.05 
by two-tailed unpaired t-test). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a b c 
MnCl, 0 15’ 30’ 1h 6h 


1.5 x NT = MnCl, 


Mncl, 0 5S’ 15° 30’ 


var TAZ 
YA aie 43 


< 
> 
U 
Fold Activation 
(GTIIC-Luc) 
Oo 
o 


1 -——__4 
DMSO mSRI 


= 
d2') @ 315, ° GFP = Pro32Pro33 f 
— o as 
<< = * Lo 
Z 1.0 =! . o> 
mw 1.0}; —— = 27 
E rs . * . * SO 
© " . a= 
o o . ‘7 i, 
x & 0.0 
0.0 ~ >j{_—__y pa GFP Pro32Pro33 
CTGF CYR61 
h _ e NT = RGD i 
2 1.5 a * 
a) 1 ae 
es Mp all pesca Syi¢ Fe g * a pYAP =i aie ee 
0 ee Tl ll 
TZ a a Fs or ™ 
2 05 MnCl, - + - + - + 
4S s mSRi - - 4+ 4 - - 
x 0.0 
MnCl, - + - + - + : mP6 - - - - + + 
Ch Scramble + + - - ++ - - 
Sh-Go,3- - + + ee 
CTGF CYR61 
Extended Data Figure 2 | USS inhibits YAP/TAZ through integrin- by suppressed (e) expression of YAP/TAZ target genes and (f) YAP/TAZ 
Goa 3-RhoA pathway. a, MnCl, (0.5 mM) promotes YAP phosphorylation —_ reporter gene activity (n = 3; *P < 0.05 by two-tailed unpaired f-test). 
shown by immunoblotting. b, MnCl, reduces nuclear YAP/TAZ levels in g, Gay; or integrin 8; knockdown reverses MnCl,-induced YAP/TAZ 
HUVECs. ¢, Ga; inhibiting peptide mSRI reverses MnCl,-induced nuclear exportation in HUVECs. h, Gaj3 knockdown reverses RGD- 
YAP/TAZ reporter (8 x GTIC-luc) gene activity (n = 3; *P < 0.05 by containing peptide-mediated CTGF and CYR61 suppression in HUVECs 
two-tailed unpaired t-test). d, RGD-containing peptide GRGDSP (n=3; *P < 0.05 by two-tailed unpaired t-test). i, Ga)3 inhibiting peptide 
downregulates YAP/TAZ downstream target gene expression (n = 3; mSRI and mP6 reverse MnClj-induced (5 min) pYAP but not total YAP 
*P < 0.05 by two-tailed unpaired t-test). e, f, Pro32pro33 mutation in expression in HUVECs. The experiments were repeated at least three 
integrin 83 inhibits YAP/TAZ transactivation in HUVECs, as verified times and the representative results are shown. 
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Extended Data Figure 3 | YAP/TAZ activation increases JNK activity. (e) disturbed flow for different durations. Experiments were repeated three 
a, Heatmap for mRNA sequencing results showing CA-YAP/TAZ times and the representative results are shown. f, YAP/TAZ knockdown 
promotes expression of pro-inflammatory genes. b, CA-YAP/TAZ suppresses basal and PMA-induced JNK phosphorylation in HUVECs. 
increases the promoter activity of adhesion molecules in HUVECs g, Overexpression of dominant-negative YAP (YAP S94A) inhibits PMA- 
(n= 4; *P < 0.05 by two-tailed unpaired t-test). c, Summarized data for induced AP-1 reporter gene activity (n = 3; *P < 0.05 by two-tailed 
CA-YAP/TAZ overexpression increases monocyte attachment to HUVECs __ unpaired t-test). h, CA-YAP/TAZ increases AP-1 reporter gene activity in 
(n=4; *P < 0.05 by two-tailed unpaired t-test). d, e, Immunoblotting HUVECs (n =4; *P < 0.05 by two-tailed unpaired t-test), and PMA was 
showing JNK phosphorylation in HUVECs exposed to (d) USS or used as positive control for monitoring AP-1 activity. 
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overexpression transgenic mice. b, En face staining showing increased tailed unpaired t-test). e, f, EC-specific YAP overexpression does not affect 
YAP expression in endothelial cells of the Tie2“’*; Yap-COE'®/+; ApoE '~ serum levels of (e) cholesterol or (f) triglycerides (n = 10; *P < 0.05 by 
(n= 10). c, Summarized data for EC-specific YAP overexpression- two-tailed unpaired f-test). 


increased JNK phosphorylation (n = 10; *P < 0.05 by two-tailed unpaired 
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Extended Data Figure 5 | Inhibiting TAZ activity by shRNA or MnCl, 
administration delays atherogenesis and is independent of lipid 
metabolism, while activating YAP/TAZ by AAV-mediated CA-YAP/ 
TAZ overexpression accelerates atherosclerotic plaque formation. 

a, Immunoblotting showing adenovirus-mediated TAZ shRNA suppressed 
TAZ expression level. b, TAZ knockdown delayed Western-diet-induced 
plaque formation in ApoE~/~ mice, n = 5; *P < 0.05 by two-tailed unpaired 
t-test. c, TAZ knockdown-suppressed plaque formation in ApoE~'~ mice 
is not due to change in lipid profile. Data are expressed as mean + s.e.m., 


n=5;*P<0.05 by two-tailed unpaired t-test. d, Immunoblotting showing 
increased YAP expression in mice injected with AAV expressing CA-YAP/ 
TAZ. e, f, Oil Red O staining (e) and summarized data (f) for CA-YAP/ 
TAZ-induced exacerbation of plaque formation; n =5, *P < 0.05 by 
two-tailed unpaired t-test. g. AAV-mediated CA-YAP/TAZ overexpression 
does not affect lipid profile in ApoE -/- mice. h, i, Oral administration of 
MnCl) does not affect (h) lipid profile or (i) SOD activity in liver. Data are 
expressed as mean +s.e.m., n= 5; *P < 0.05 by two-tailed unpaired t-test. 
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Extended Data Figure 6 | Summary of western blotting data. GRGDSP induces pYAP. g, Gay or integrin 83; knockdown reverses 
a, Endothelium removal reduces YAP level in mouse aorta. b, USS MnCl,-induced pYAP. h, Integrin gain-of-function mutation Pro32Pro33 
increases YAP phosphorylation. c, Disturbed flow reduces YAP increases pYAP. i, Constitutively activated RhoA (CA-RhoA) reverses 


phosphorylation. d, Thoracic aorta expresses higher levels of pYAP than USS-induced pYAP. Data: n = 6 for a and n=3 for other figures; *P < 0.05 


aortic arch. e, Overexpression loss-of-function mutation of integrin by two-tailed unpaired f-test. 
83 (83Acyto) suppresses USS-induced pYAP. f, RGD-containing peptide 
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reverses USS-induced pYAP. c, Gay; inhibitor SRI reverses USS-induced by two-tailed unpaired t-test. 
pYAP. d-h, Immunoblotting detection of (d) pYAP, (e) YAP, (f) TAZ, 
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Extended Data Table 1 | Drugs and concentrations used for the YAP/TAZ inhibition test 


Stock Working 
Drugs ; : Company 
concentration concentration 
Adenosine 10 mM 10 uM Sigma 
Apelin 1 mM 100 nM Sigma 
Exendin 4 100 uM 10 nM Sigma 
eg sia ce 500 mM 3 to5 mM Sigma 
niacin) 
ApoA1 1 mg/mL 10 pg/mL Sigma 
Rosuvastatin 10 mM 10 uM Cayman 
Simvastatin 100 mM 1 uM Cayman 
Atorvastatin 10 mM 1 uM Cayman 
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NLRCS3 is an inhibitory sensor of PI3K-mTOR 


pathways in cancer 


Rajendra Karki, Si Ming Man!*, R. K. Subbarao Malireddi!, Sannula Kesavardhana!, Qifan Zhu!’, Amanda R. Burton!, 
Bhesh Raj Sharma!, Xiaopeng Qi!, Stephane Pelletier!*, Peter Vogel‘, Philip Rosenstiel’ & Thirumala-Devi Kanneganti! 


NLRs (nucleotide-binding domain and leucine-rich repeats) 
belong to a large family of cytoplasmic sensors that regulate an 
extraordinarily diverse range of biological functions. One of 
these functions is to contribute to immunity against infectious 
diseases, but dysregulation of their functional activity leads to 
the development of inflammatory and autoimmune diseases’. 
Cytoplasmic innate immune sensors, including NLRs, are central 
regulators of intestinal homeostasis*~°. NLRC3 (also known as 
CLR16.2 or NOD3) is a poorly characterized member of the NLR 
family and was identified in a genomic screen for genes encoding 
proteins bearing leucine-rich repeats (LRRs) and nucleotide-binding 
domains!®"!, Expression of NLRC3 is drastically reduced in the 
tumour tissue of patients with colorectal cancer compared to healthy 
tissues’”, highlighting an undefined potential function for this 
sensor in the development of cancer. Here we show that mice lacking 
NLRC3 are hyper-susceptible to colitis and colorectal tumorigenesis. 
The effect of NLRC3 is most dominant in enterocytes, in which it 
suppresses activation of the mTOR signalling pathways and inhibits 
cellular proliferation and stem-cell-derived organoid formation. 
NLRC3 associates with PI3Ks and blocks activation of the 
PI3K-dependent kinase AKT following binding of growth factor 
receptors or Toll-like receptor 4. These findings reveal a key role for 
NLRC3 as an inhibitor of the mTOR pathways, mediating protection 
against colorectal cancer. 

Previous studies have shown that NLRC3 functions as a negative regu- 
lator of signalling pathways activated by Toll-like receptors (TLRs) and 
the DNA sensor STING in response to pathogen-associated molecular 
patterns or to virus infection'*"*. The physiological role of NLRC3 
has, however, remained largely unknown. Using an established mouse 
model of colitis-associated colorectal tumorigenesis, we investigated 
the role of NLRC3 in colorectal cancer. To do this, we injected mice 
intraperitoneally with azoxymethane, followed by three rounds of 
dextran sulfate sodium (DSS) treatment (Extended Data Fig. 1a). All time 
points referred to hereafter indicate the number of days after injection 
of azoxymethane. The number of tumours was quantified at day 80. 
Qualitative reverse-transcriptase PCR analysis revealed a reduction in 
the expression of the gene encoding NLRC3 in tumour tissue compared 
with non-tumour-associated tissue in the colon of wild-type mice 80 
days after injection of azoxymethane (day 80; Extended Data Fig. 1b). 

We injected cohorts of co-housed wild-type and Nirc3~'~ mice 
with azoxymethane, followed by three rounds of DSS treatment, 
and examined the prevalence of tumours in the colon of these mice 
at day 80 (Extended Data Fig. 1a, c). We found that Nirc3~'~ mice 
lost more body weight after the first two rounds of DSS treatment 
compared to wild-type mice and developed significantly more tumours 
(Fig. la-d). Histological hallmarks associated with thickening of the 
colon, inflammation, ulceration, hyperplasia and the extent or severity 
of damage were more frequently identified in the middle and distal 


colon and the rectum of Nirc3~'~ mice compared to the correspond- 
ing regions in wild-type mice (Fig. le, f and Extended Data Fig. 1d). 
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Figure 1 | NLRC3 prevents colorectal tumorigenesis. a, Percentage 
change in body weight during azoxymethane and DSS treatment; day 0 

is the time of azoxymethane injection. b, Representative images of colon 
tumours in mice 80 days after injection of azoxymethane. c, d, Number 
(c) and size (d) of colon tumours in wild-type (WT; n = 37) and Nirc3-!~ 
(n=35) mice. e, f, Haematoxylin and eosin staining (e) and histological 
scores (f) of tumours in colon as shown in b. Scale bars, 2,500 1m (whole 
colon, top), 200 1m (magnified, bottom). g, Number of colon tumours 

in littermate wild-type (n = 10), Nirc3*/~ (n=5) and Nirc3~'~ (n=6) 
mice. h, Number of colon tumours in bone-marrow chimaera mice 
treated as in b. Wild-type mice transplanted with wild-type bone marrow 
(n=10); Nirc3~'~ mice transplanted with wild-type bone marrow (n= 9); 
wild-type mice transplanted with Nirc3~/~ bone marrow (n= 8); and 
Nirc3-'~ transplanted with Nirc3~/~ bone marrow (n= 9). i, Number of 
colon tumours in littermate Nirc3 (n= 8), LysM"*Nirc3 (n=11), 
Vav1’NIrc3" (n =9), Vil1°Nirc3“" (n= 7) and Nirc3~'~ (n=8) mice 
treated as in a. Each symbol represents an individual mouse (c, f-i). 

*P< 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; NS, not statistically 
significant by one-way ANOVA (a, g-i) or two-tailed t-test (c, f). Data are 
from three (a-f) or two independent experiments (g-i) and are presented 
as mean +s.e.m. ina, ¢, f-i. 
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Figure 2 | NLRC3 suppresses overt proliferation. a, Images and 
quantification of the number of Ki67* (left) and PCNA* (right) cells in 
each crypt of wild-type (day 0, n=5; day 14, n=8) and Nirc3~'~ (day 0, 
n=5; day 14, n=8) mice. b, Images (left) and quantifications (right) 

of the number (top) and size (bottom) of mouse intestinal organoids. 

c, Proliferation of the HCT116 cell line. d, Proliferation of primary mouse 
fibroblasts. At least 25 crypts were counted in each animal (a). Scale 

bars, 200 1m (a) or 501m (b). Each symbol represents one crypt (a) or 
one organoid (b). **P < 0.01; ***P < 0.001; ****P < 0.0001; NS, not 
statistically significant by two-tailed t-test. Data are from one experiment 
representative of two (a, b) or three independent experiments (c, d) and 
are presented as mean + s.e.m. in a-d. 


All Nirc3~'~ mice suffered high-grade dysplasia, whereas wild-type 
mice suffered low-grade dysplasia (Extended Data Fig. le). We found 
that 63% of the Nirc3~'/~ mice were positive for adenocarcinoma in the 
colon, compared to 0% of the wild-type controls at day 80 (Extended 
Data Fig. 1f). Although NLRC3 showed a gene-dose-dependent 
response to azoxymethane and DSS (Fig. 1g), it does not appear to 
have an effect on the normal mouse intestine or colon (Extended Data 
Fig. 1g). 

Nirc3~'~ mice lost significantly more body weight and suffered more 
severe shortening of, and damage to, the colon after only a single round 
of DSS treatment compared to their wild-type counterparts (Extended 
Data Fig. 2a—c). Certain members of the NLR protein family can form 
inflammasomes, driving maturation of IL-18,a cytokine important 
for mediating protection against colitis-associated tumorigenesis” ">. 
We did not observe differential production of IL-18 in wild-type and 
Nirc3~'~ mice at day 14 or at day 80 (Extended Data Fig. 2d). Instead, 
production of the other inflammasome-associated cytokine IL-18 and 
inflammasome-independent cytokines IL-6, TNF and GCSF and the 
chemokines KC (also known as CXCL1), MCP-1 (also known as CCL2) 
and MIP-1a (also known as CCL3) was elevated in colon tissue of 
Nirc3~'~ mice compared to wild-type mice at day 14 (Extended Data 
Fig. 2d-g). We further confirmed these results and also found increased 
levels of circulating IL-6, GCSE, KC and MIP-1a in the sera of Nirc37!— 
mice compared to wild-type mice at day 14 (Extended Data Fig. 2h). 
The expression of IL-17 and IL-22 was also elevated in the colon tissue 
of Nirc3~/~ mice compared to wild-type mice at day 14, whereas the 
expression of IL-23, IFNB and IFNy remained unchanged (Extended 
Data Fig. 3a). Consistent with the observation that Nirc3~/~ mice had 
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Figure 3 | NLRC3 controls mTOR signalling pathways. a, Immunoblot of 
mouse colon tissue and densitometric quantification at day 14. P- indicates 
phosphorylated protein. b, Immunohistochemical staining of mouse colon 
tissue. c, Immunofluorescence staining of mouse intestinal organoids after 
7 days of culture. d, Immunoblot of organoids treated with IGF-1 and 
densitometric quantification. e, Immunoblot of colon tissue of mice and 
densitometric quantification at day 14. f, Immunohistochemical staining 
of colon tissue of mice for AKT that has been phosphorylated at Ser473. 

g, Immunofluorescence staining of mouse intestinal organoids after 

7 days of culture. h, Immunoblot of organoids treated with IGF-1 and 
densitometric quantification. Scale bars, 2,500 1m (b, whole colon, left), 
200 um (b, magnified, right), 501m (c, f, g). *P< 0.05; **P< 0.01; 

** P< 0.001; by two-tailed f-test (a and e) or one-way ANOVA (d, h). 
Data are from one experiment representative of two (a-c, e-g) or three 
independent experiments (d, h) and are presented as mean + s.e.m. 

ina, d, e and h. For gel source data, see Supplementary Fig. 1. 


elevated levels of many pro-inflammatory mediators at day 14, we 
observed increased levels of IkBa and STAT3 phosphorylation in the 
colon tissue of Nirc3-'~ mice compared to wild-type mice (Extended 
Data Fig. 3b). However, differential phosphorylation of ERK was 
not observed (Extended Data Fig. 3b). Global increases in the pro- 
duction of inflammatory mediators and the activation of immune 
signalling pathways reflected the hyper-susceptibility of Nirc3'~ mice 
to colitis. 

Using flow cytometry, we profiled the immune cell populations in 
the colons of untreated wild-type and Nirc3~/~ mice and wild-type 
and Nirc3~'~ mice at days 8 and 14. We observed an increased number 
of macrophages, neutrophils and natural killer cells in the colons of 
Nirc3~'~ mice compared to wild-type mice 14 days after azoxymethane 
and DSS treatment (Extended Data Fig. 3c, d), which is consistent 
with the increased levels of inflammation observed at this time point. 
However, we did not observe differences in the relative number of 
macrophages, CD11b*CD11c* cells, neutrophils, B cells, CD4* T cells, 
CD8* T cells and natural killer cells between untreated wild-type and 
Nirc3~'~ mice or mice at day 8 (Extended Data Fig. 3d). NLRC3 has 
been implicated in the regulation of T-cell activation!!; however, we did 
not observe a difference in the levels of IFN-y* or TNF* CD4t T cells 
when wild-type and Nirc3-/~ splenocytes were stimulated with CD3 
and CD28 in the presence of IL-2 (Extended Data Fig. 3e). 

We performed bone-marrow chimaera studies to identify the contri- 
bution of NLRC3 in haematopoietic cells versus radioresistant stromal 
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Figure 4 | NLRC3 regulates upstream signalling molecules within the 
PI3K-AKT-mTOR pathway. a, Immunofluorescence staining of primary 
fibroblasts and frequency of co-localization between mTOR and LAMP1 
(n> 150). b, Images and quantification of colon tumours in littermate 
ApeMi"!* and ApcMi"!+ Nirc3~'~ mice 40 days after treatment with vehicle 
or NVP-BEZ235. c, Immunohistochemical staining for phosphorylated 

S6 of colon tissue from mice treated as in b. d, Immunoprecipitation of 
the GFP tag in 239T cells transfected with a plasmid encoding GFP alone 
or NLRC3-GFP. e, Immunoblot of mouse colon tissue and densitometric 


cells during colitis-associated tumorigenesis. As we expected, Nirc3~/~ 
mice that received Nirc3-/~ bone marrow were more susceptible to 
tumorigenesis than wild-type mice that received wild-type bone mar- 
row (Fig. 1h and Extended Data Fig. 4a). However, Nirc3~/~ mice that 
received wild-type bone marrow had a significantly increased tumour 
burden compared with wild-type mice that received wild-type bone 
marrow. In addition, wild-type mice that received Nirc3~'~ bone mar- 
row hada significantly increased tumour burden compared to wild-type 
mice that received wild-type bone marrow (Fig. 1h and Extended Data 
Fig. 4a). We further confirmed our findings and generated mice that 
lacked NLRC3 in haematopoietic cells specifically (Vav1°NIrc3™"), 
cells of the myeloid lineage (LysM*Nirc3™") and intestinal epithelial 
cells (Vil1°°NIrc3™"), Mice lacking NLRC3 in intestinal epithelial cells 
developed the highest number of tumours, followed by mice lacking 
NLRC3 in haematopoietic cells (Fig. li and Extended Data Fig. 4b). 
Mice lacking NLRC3 in cells of the myeloid lineage had a similar 
number of tumours to wild-type mice (Fig. li and Extended Data 
Fig. 4b). These data support the observation that the oncogenic inhibi- 
tory effect of NLRC3 is more dominant in intestinal epithelial cells and 
more subtle in haematopoietic cells. 

A closer examination of the intestinal epithelial cells of Nirc3~/~ 
mice revealed a significant increase in numbers of both Ki67* and 
PCNA‘ (both proteins that are associated with cellular proliferation) 
cells per intestinal crypt compared to wild-type mice at day 14 (Fig. 2a). 
Additionally, colonic epithelial stem cells collected from Nirc3~'~ mice 
more readily developed into organoids in ex vivo culture compared to 


quantification at day 14. f, Immunoblot of primary mouse fibroblasts 
transduced with a retroviral vector encoding GFP or NLRC3-GFP, with or 
without stimulation with IGF-1, and densitometric quantification. Scale 
bar, 10,1m (a), 200m (c). *P << 0.05; **P < 0.01; ****P < 0.0001; 

NS, no statistical significance by two tailed t-test (e) or one-way ANOVA 
(a, b, f). Data are from one experiment representative of two (a-c, e) or 
three (d and f) independent experiments (mean + s.e.m. ina, b, e, f). 

For gel source data, see Supplementary Fig. 1. 


those collected from wild-type mice (Fig. 2b). The average diameters of 
the organoids derived from Nirc3~/~ mice were significantly increased 
compared to the average diameters of the organoids from wild-type mice 
(Fig. 2b). Expression of the stem-cell marker Lgr5 in the colon tissue 
was similar in wild-type and Nirc3~/~ mice (Extended Data Fig. 4c), 
suggesting that the differences in the number and size of intestinal 
organoids was due to differential colony-forming capacity rather than 
differences in the numbers of starting intestinal stem cells. To investi- 
gate the effect of NLRC3 on cell proliferation more directly, we overex- 
pressed NLRC3 in the human colon cell line HCT116 and found that 
these cells exhibited reduced levels of proliferation compared to cells 
expressing a control (GFP) protein (Fig. 2c). Furthermore, primary 
Nirc3~/~ fibroblasts proliferated more rapidly than wild-type fibro- 
blasts (Fig. 2d). 

Cellular proliferation can also be achieved when growth factors, 
nutrients and cellular energy activate metabolic pathways via the kinase 
mTOR!®. We found increased phosphorylation levels of S6 kinase, 
4E-BP1 and AKT at Ser473, the downstream targets of mTOR, in the 
colon tissue of Nirc3~'~ mice compared to wild-type mice at day 14 
(Fig. 3a, b). Increased phosphorylation of these mTOR targets was 
also observed in Nirc3~/~ organoids compared to wild-type organoids 
(Fig. 3c, d). However, we found no difference in the expression of genes 
involved in the Wnt signalling pathway, including Wnt1, Ctnnb1, Lef1, 
Tcf4, Tcf7 and Axin2 (Extended Data Fig. 4c). The extent of nuclear 
localization of B-catenin was also similar between wild-type and 
Nirc3~'~ mice (Extended Data Fig. 4d). 
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Dysregulation of the mTOR signalling pathways in Nirc3~'~ mice 
occurred very early, 8 days after injection of azoxymethane, whereas no 
difference in the production of inflammatory cytokines and mediators 
and phosphorylation of Ik Ba was observed at this time point (Extended 
Data Fig. 5, a~c). Moreover, the differential AKT-mTOR signalling 
observed at this time point was not caused by differences in the relative 
levels of immune cells recruited to the colon (Extended Data Fig. 3d). 
These data suggest that the dysregulated mTOR signalling observed at 
an earlier time point may lead to increased NF-KB signalling at a later 
time point. 

Phosphorylation and activation of mTOR is driven by a number 
of upstream signalling proteins in the PI3K-AKT-mTOR pathway. 
Phosphorylation of AKT at the Thr308 site by the kinase PDK1 allows 
AKT to activate mTOR!7"!°. To examine whether NLRC3 directly 
affects the apical molecules in the PIZK-AKT-mTOR pathway, we 
first investigated the phosphorylation status of AKT at the Thr308 site. 
We observed elevated phosphorylation of AKT at Thr308 in the colon 
tissue of Nirc3-/~ mice compared to wild-type mice at day 14 (Fig. 3e). 
Increased levels of AKT phosphorylation in colon tissue were observed 
predominately in epithelial cells and, to a lesser extent, in infiltrating 
cells (Fig. 3f). Similarly, increased phosphorylation of AKT at Thr308 
was observed in Nirc3~/~ organoids treated with IGF-1 compared 
to wild-type controls (Fig. 3g, h). We further observed increased 
activation of PDK1 in the colon tissue of Nirc3~'~ mice treated with 
azoxymethane and DSS compared to wild-type mice (Fig. 3e). In 
addition, elevated degree of phosphorylation of AKT at Thr308 and 
of 4E-BP1 were observed in the colon of Nirc3*/~ heterozygous mice 
compared to littermate wild-type mice at day 14, although this increase 
was smaller than that observed in littermate Nirc3~'~ homozygous 
mice (Extended Data Fig. 6a). The gene-dose-dependent effect of 
NLRC3 on the suppression of the mTOR signalling pathways is 
reminiscent of the gene-dose-dependent effect of NLRC3 on the 
suppression of tumorigenesis (Fig. 1g). 

Activated mTOR is phosphorylated and migrates to lysosomal and 
late endosomal membranes”°*!. We observed increased phospho- 
rylation of mTOR in the colon tissue of Mrc3~'~ mice (Fig. 3e). We 
also found an increased co-localization frequency between mTOR 
and LAMP! puncta in IGF-1-treated primary Nirc3~/~ fibroblasts 
compared to IGF-1-treated primary wild-type fibroblasts (Fig. 4a). 
Increased mTOR signalling was observed in Nirc3~'~ fibroblasts or 
wild-type fibroblasts treated with short interfering RNAs (siRNAs) 
against Nirc3 compared to their corresponding controls (Extended 
Data Figs 6b-e, 7a-d). 

We further investigated whether NLRC3 is able to restrict prolife- 
ration in a spontaneous mouse model of colon cancer. We crossed the 
mouse line containing a heterozygous mutation in the gene encoding 
adenomatous polyposis coli (Ape™"/*) with Nirc3~'~ mice and 
found that Ape™"!* Nirc3~'~ mice had a higher tumour burden than 
Apc™'* control mice (Extended Data Fig. 8a). Of the Ape" * Nirc3/~ 
mice, 40% developed hyperplasia (compared to 0% in the littermate 
control group) and Ape“! Nirc3~/~ mice exhibited increased damage 
in the colon (Extended Data Fig. 8b, c). Notably, we observed increased 
number of Ki67* proliferative cells and cells positive for phospho- 
rylated $6 kinase in the colon of Ape™"’* Nirc3~'~ mice compared 
to ApceM"/+ mice (Extended Data Fig. 8c). Moreover, the capacity of 
ApcM"'* Nirc3~'~ intestinal stem cells to proliferate into organoids 
was greater than that of ApeM”’* intestinal stem cells (Extended Data 
Fig. 8d). Treatment with the PI3K inhibitor LY294002 and the mTOR 
inhibitor rapamycin impaired the ability of intestinal stem cells to 
proliferate into organoids in both strains (Extended Data Fig. 8d). 
Treatment of the Ape™"’* mice and ApeM'* Nirc3~/~ mice with the 
PI3K-—mTOR inhibitor NVP-BEZ235 reduced the tumour burden 
and phosphorylation of $6 kinase in the tumours and enterocytes of 
ApcM"'* Nirc3~'~ mice to a level observed in treated ApcM"’* mice 
(Fig. 4b, c). Collectively, these data suggest that NLRC3 restricts cellular 
proliferation via the PI3K-mTOR axis during colon tumorigenesis. 
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Following generation of inositol phospholipids by activated 
PI3Ks, AKT is recruited to the cell membrane where it undergoes a 
conformational change and is phosphorylated at Thr308 by PDK1 
(refs 17-19). Co-immunoprecipitation assays showed that NLRC3 
weakly interacted with PDK1 and did not interact with AKT (Fig. 4d). 
Instead, we found that NLRC3 co-immunoprecipitated with p85 
subunits of PI3K (Fig. 4d). In addition, we observed increased levels of 
interaction between the p85 and p110c subunits of PI3K in Nirc3~/— 
primary fibroblasts or mouse bone-marrow-derived macrophages 
(BMDMs) (Extended Data Fig. 9a, b). We also observed a higher level 
of phosphorylation and activation of p85 PI3K in the colon tissue of 
Nirc3~'~ mice treated with azoxymethane and DSS compared to wild- 
type mice (Fig. 4e). These data provide evidence to suggest that NLRC3 
disrupts an association between the PI3K p85 and p110a subunits and 
reduces the activity of PI3K p85 itself. Deletion of the CARD, NACHT 
or LRR domain of NLRC3 impaired the ability of NLRC3 to interact 
with either the p85 or the p110a subunit of PI3K (Extended Data 
Fig. 9c-g). Reconstitution of NLRC3 in Nirc3~/~ fibroblasts reduced 
the degree of phosphorylation of AKT Thr308 and other downstream 
molecules to levels similar to those seen in wild-type fibroblasts upon 
stimulation with IGF-1 (Fig. 4f). 

In addition to growth factor receptors, activation of TLR4 can engage 
the PI3K-AKT-mTOR pathway***, We observed increased activation 
of the mTOR signalling pathways in lipopolysaccharide (LPS)-treated 
primary Nirc3-/~ BMDMs compared to LPS-treated wild-type BMDMs 
(Extended Data Fig. 10a,b). We further confirmed our findings in an 
independently generated line of NLRC3-deficient mice that we term 
Nirc3'"4 mice (NLRC3 large deletion, data not shown; see Methods). 
Collectively these findings identify NLRC3 as an inhibitory sensor 
of the PIS3K-AKT-mTOR pathway, mediating protection against 
tumorigenesis in colorectal cancer (Extended Data Fig. 10c). 

NLRC3 does not act solely to protect against cancer. A previous study 
has shown that expression of NLRC3 is downregulated in patients with 
the autoimmune disease Wegener’s granulomatosis”. Moreover, a 
loss-of-function mutation in the gene encoding the NLRC3-like 
protein in zebrafish results in systemic inflammation”’. These findings 
collectively provide evidence to support the cross-species functionality 
of NLRC3. Whether NLRC3 needs to be bound to a specific ligand 
or is engaged in the activation of its regulatory function in a ligand- 
independent manner remains to be explored. Understanding the 
precise functions of NLRC3 could open up new avenues in the treat- 
ment of infectious and autoinflammatory diseases and cancer. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice. Wild-type (C57BL/6) and Nirc3~/~ mice were bred and maintained under 
specific pathogen-free conditions at St. Jude Children’s Research Hospital. To 
generate the Nirc3~/~ mice, exons 2 and 3 of the gene encoding NLRC3 (2.5kb) 
were excised, which resulted in deletion of 661 amino acids encoding the 
N-terminal caspase-recruitment domain (also known as CARD) and the central 
nucleotide-binding domain (also known as NACHT) of NLRC3. Splicing of exon 
1 and exon 4 led to a frameshift, generating a premature stop codon (Extended 
Data Fig. 1c). The targeting vector ROS1-HR was linearized and transfected into 
embryonic stem (ES) cells. Cells positive for the targeting vector were selected by 
resistance to 200,1g ml“! of G418 and further screened for 5’ and 3’ homologous 
recombination. ES cell clones with the correct targeting events were used for blas- 
tocysts injection. CMV-Cre mice were used to delete the conditional floxed Nirc3 
allele to generate the Nirc3~/~ mice. B6.129P2-Lyz2'" ("1/7 (004781, The Jackson 
Laboratory), B6.Cg-Tg(Vav1-icre) A2Kio/J (008610, The Jackson Laboratory) and 
B6.Cg-Tg(Vil-cre)997Gum/J mice (004586, The Jackson Laboratory) were used to 
delete the conditional floxed Nirc3 allele in a cell-type-specific manner. Expression 
of NLRC3 in wild-type and Nirc3~/~ mice were confirmed by genomic PCR ampli- 
fication (Extended Data Fig. 1c). Nirc3~'~ mice were backcrossed to C57BL/6 for 
nine generations. 

Mirc3'*"4 (Nirc3 large deletion) mice were generated by injection of two single 
guide RNAs (NIrc3-Guide-01: 3/-ATTCCCAGGTCGTCTTAGGC-5'; 125ng jl!) 
and Nlrc3-Guide-02: 3‘/-AGTGGAACAGCACAGTTCGC-5’; 125 ng jl}, 
designed to introduce DNA double-strand breaks into intron 1 and intron 3 of the 
Nirc3 gene, together with a codon-optimized Cas9 mRNA transcript (50 ng jl!) 
into the cytoplasm of the pronuclear stage C57BL/6J zygotes (Transgenic Core 
Unit, St. Jude Children’s Research Hospital). Injected embryos were surgically 
transplanted into oviducts of pseudo pregnant CD1 females and new-born 
mice bearing the intron-1-to-intron-3 deletion (~3.8 kb) were identified by 
the amplification of a 1.1-kb fragment using primers flanking the two break 
sites: Nlrc3-F51: 3’-AGAGTGGTGCCATCTTCTGC-5’ and Nlrc3-R32: 
3'-CTCAAGTCAGGGCAGCATGA-5’. Sanger sequencing of the ~1.1-kb amplicon 
confirmed proper deletion of the 3.8-kb fragment containing exon 2 and 3. The 
sgRNAs and Cas9 mRNA transcript were designed and generated as described 
previously’. Potential off-target sites were identified using Cas-OFFinder”*, 
amplified by PCR and sequenced. No off-target site cleavage was observed. Two 
founder mice were used to establish the mouse lines. Animal study protocols were 
approved by the St. Jude Children's Research Hospital Committee on Use and 
Care of Animals. 

Azoxymethane-DSS model of colorectal tumorigenesis. Male and female 
mice were used at the age of six weeks. For cohousing experiments, wild-type 
and Nirc3~'~ mice were co-housed for three weeks and separated before injection 
of azoxymethane. We also performed experiments whereby wild-type and 
Nirc3~'~ mice were co-housed for three weeks before injection of azoxymethane 
and remained co-housed over the course of the experiments. In both cases, 
the results did not differ. Mice were injected intraperitoneally with 10 mg of 
azoxymethane (A5486, Sigma) per kg body weight, according to previously 
established protocols”. After 5 days, 2% DSS (9011-18-1, Affymetrix eBioscience) 
was given in the drinking water for 6 days followed by regular drinking water for 
2 weeks. This cycle was repeated twice more with 1.5% DSS and mice were killed 
on day 80 (Extended Data Fig. 1a). For day 8 samples, mice were injected with 
azoxymethane, and after 5 days, fed with 2% DSS for 3 days before being killed. 
For Day 14 samples, mice were injected with azoxymethane, and after 5 days, 
fed with 2% DSS for 6 days. Mice were then fed with regular water for 3 days and 
collected. Bone-marrow chimaera studies were performed as described previously’. 
No randomization or blinding was performed. 

Ape“ model of colorectal tumorigenesis. C57BL/6J-Apce™”’+ /J mice (002020, 
The Jackson Laboratory) were crossed with Nirc3~/~ mice. Littermate Apc™”’+ 
and ApcMi/+ Nirc3~/~ mice were administered with either 40 mg kg~! body 
weight of the dual inhibitor of PI3K and mTOR, NVP-BEZ235 (N-4288, 
LC Laboratories) dissolved in 10% (v/v) 1-methyl-2-pyrrolidone (328634, Sigma) 
plus 90% (v/v) polyethylene glycol 300 (90878, Sigma) or the control vehicle 10% 
(v/v) 1-methyl-2-pyrrolidone plus 90% (v/v) polyethylene glycol 300 by daily oral 
gavage for 40 days from 6 weeks of age. 

Histology and microscopy analysis. Colons were rolled into a ‘Swiss roll’ and fixed 
in 10% formalin, processed and embedded in paraffin by standard techniques. 
Longitudinal sections of 5|1m were stained with haematoxylin and eosin and 
examined by a pathologist blinded to the experimental groups. Colitis scores were 
assigned based on inflammation, ulceration, hyperplasia and the extent or severity 
of damage. Severity scores for inflammation were assigned as follows: 0 = normal 
(within normal limits); 2= minimal (mixed inflammation, small, focal or widely 
separated, limited to lamina propria); 15 = mild (multifocal mixed inflammation, 
often extending into submucosa); 40 = moderate (large multifocal lesions within 


mixed inflammation involving mucosa and submucosa); 80 = marked (extensive 
mixed inflammation with oedema and erosions; 100 = severe (diffuse inflam- 
mation with transmural lesions and multiple ulcers). Scores for ulceration were 
assigned as follows: 0 = normal (none); 2= minimal (only one small focus of 
ulceration involving fewer than 5 crypts); 15 = mild (a few small ulcers up to 
5 crypts); 40 = moderate (multifocal ulcers up to 10 crypts); 80 = marked 
(multifocal to coalescing ulcers involving more than 10 crypts each); 100 = severe 
(extensive to diffuse with multiple ulcers covering more than 20 crypts each). 
Scores of hyperplasia were assigned as follows: 0 = normal; 2 = minimal (some 
areas with crypts elongated and increased mitoses); 15= mild (multifocal areas 
with crypts elongated up to twice the normal thickness, normal goblet cells 
present); 40 = moderate (extensive areas with crypts up to twice the normal 
thickness, reduced goblet cells); 80 = marked (mucosa over twice the normal 
thickness, hyperchromatic epithelium, reduced or rare goblet cells, possibly 
foci of arborization); 100 = severe (mucosa twice the normal thickness, marked 
hyperchromasia, crowding or stacking, absence of goblet cells, high mitotic index 
and arborization). Scores of extent were assigned as follows: 0 = normal (rare 
or inconspicuous lesions); 2 = minimal (less than 5% involvement); 15 = mild 
(multifocal but conspicuous lesions, 5-10% involvement); 40 = moderate 
(multifocal, prominent lesions, 10-50% involvement); 80 = marked (coalescing 
to extensive lesions or areas of inflammation with some loss of structure, 50-90% 
involvement); 100 = severe (diffuse lesion with effacement of normal structure, 
over 90% involvement). The proliferating cells in the intestinal epithelium were 
detected by immunoperoxidase staining for Ki67 and PCNA. The immunohis- 
tochemistry antibodies used were: Ki67 (NBP1-40684, Novus), PCNA (M0879, 
DAKO), 8-catenin (610154, BD), P-AKT Ser473 (4060, Cell Signaling), and P-S6 
Ribosomal Protein Ser235-236 (4858, Cell Signaling). Tissues were counterstained 
with haematoxylin. The number of Ki67* or PCNA* cells per crypt in each animal 
was counted (at least 18-20 crypts per mouse). 

Cell culture and stimulation of cells. Pinna of adult wild-type and Nirc3~/~ mice 
were minced and digested with 100 mg ml’ collagenase type IV (LS004188, 
Worthlington Biochemical Corporation) for 3h, followed by filtration through 
70-\um strainers to obtain fibroblasts. Cells were cultured in 50% FBS (TMS-013-B, 
Millipore) in DMEM (11995073, ThermoFisher Scientific) supplemented with 
HEPES (15630-080, ThermoFisher Scientific), 1% penicillin and streptomycin 
(15070-063, ThermoFisher Scientific), L-glutamine (25030, ThermoFisher 
Scientific), sodium pyruvate (11360, ThermoFisher Scientific), non-essential 
amino acids (11140, ThermoFisher Scientific), and }-mercaptoethanol (21985023, 
ThermoFisher Scientific) for the first 3-4 days. Cells were subcultured in DMEM 
supplemented with 10% FBS and 1% penicillin and streptomycin. All primary 
fibroblasts were used before reaching the sixth passage. Fibroblasts were seeded 
onto six-well plates at a density of 2 x 10° cells per well. Cells were deprived of 
serum for 36h and further incubated in PBS for 1h. Cells were then stimulated 
with 50ng ml! of recombinant murine IGF-1 (250-19, Peprotech) for the 
indicated time. 

BMDMs were cultured as described previously*°. BMDMs were stimulated 
with 500ng ml! ultrapure LPS from Salmonella minnesota R595 (tlrl-smlps, 
InvivoGen) for the indicated time. The human colorectal carcinoma HCT116 
cell line (ATCC#CCL-247, American Type Culture Collection) was cultured in 
McCoy’s 5A medium (16600-082, ThermoFisher Scientific) supplemented with 
10% FBS and 1% penicillin and streptomycin. The embryonic kidney epithelial cell 
line HEK293T (ATCC#3216, American Type Culture Collection) and L929 cell line 
(ATCC#CRL-2648, American Type Culture Collection) were cultured in DMEM 
supplemented with 10% FBS. All cell lines were maintained at 37 °C with 5% COp. 
Colon organoid culture. Mouse colon stem cells were cultured using IntestiCult 
organoid growth medium according to the manufacturer's instructions (06005, 
STEMCELL Technologies). The whole colon was removed from untreated wild- 
type and Nirc3~/~ mice and rinsed with cold PBS. The colon was cut into 2-mm 
segments and washed 20 times with cold PBS. Colonic segments were incubated 
in Gentle Cell Dissociation Reagent (07174, STEMCELL Technologies), rotated 
at 350g for 15 min at room temperature, followed by re-suspension in PBS 
supplemented with 0.1% BSA (A6003, Sigma). Dissociated colonic crypts were filtered 
through 70-1m strainers. Dissociated colonic crypts were resuspended in DMEM/ 
F12 medium with 15mM HEPES (36254, STEMCELL Technologies), counted and 
resuspended in Intesticult organoid growth medium and Matrigel (356230, Corning) 
ina 1:1 ratio. Cells were plated in 24-well culture plates (3738, Corning). Intesticult 
organoid growth medium were added to the cell culture plates to immerse the 
matrix composed of Intesticult organoid growth medium and Matrigel. For inhi- 
bition studies, 501M of LY294002 (440202, Millipore) or 10g ml”! rapamycin 
(553210, Sigma) was added to the Intesticult organoid growth medium. 
Proliferation assay. Cell proliferation was measured using the WST-1 reagent 
(05015944001, Roche). Primary ear fibroblasts or HCT116 cells were plated at 
a density of 5,000 cells per well in 96-well tissue culture plates and incubated 
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overnight. Cells were deprived of serum for 36h and exposed to normal culture 
media for the indicated time interval. The WST-1 reagent was added to the cells 
for 2h. Plates were then read at 450 nm. The number of cells was calculated using 
a standard curve"). 
Immunoblotting. Proteins were extracted from colon tissue or cells using RIPA 
lysis buffer supplemented with protease (11697498001, Roche) and phosphatase 
inhibitors (04906837001, Roche) as described previously*”. Samples were resolved 
in 8—15% SDS-PAGE and transferred onto PVDF membranes (IPVH00010, 
Millipore). Blocking was performed in 5% milk for 1h and membranes were 
incubated in primary antibodies overnight at 4°C. Membranes were incubated 
with HRP-conjugated secondary antibody for 1h and proteins were visualized 
using Super Signal Femto substrate (34096, ThermoFisher Scientific). The 
primary antibodies used were P-ERK (1:1,000, 9101, Cell Signaling), ERK (1:1,000, 
9102, Cell Signaling), P-IkBo (1:1,000, 9241, Cell Signaling), Ix Bc (1:1,000, 9242, 
Cell Signaling), P-AKT Ser473 (1:1,000, 4060, Cell Signaling), P-AKT Thr308 
(1:1,000, 13038); AKT (1:1,000, 4691, Cell Signaling), P-mTOR Ser2448 (1:1,000, 
2971, Cell Signaling), mTOR (1:1,000, 2972, Cell Signaling), P-p70 S6K Thr389 
(1:1,000, 9205, Cell Signaling), P-S6 Ser235/236 (1:1,000, 4856, Cell Signaling), 
P-4E-BP1 Thr37/46 (1:1,000, 2855, Cell Signaling), P-SSTAT3 Tyr705 (1:1,000, 9131, 
Cell Signaling), P-PDK1 (1:1,000, 3061, Cell Signaling), PDK1 (1:1,000, 3062, Cell 
Signaling), P-PI3K p85 (1:1,000, 4228, Cell Signaling), PISK p85 (1:1,000, 4257, Cell 
Signaling), PI3K p110a (1:1,000, 4249, Cell Signaling), mouse anti-GFP (1:2,000, 
sc-9996, Santa Cruz Biotechnology), rabbit anti-GFP (1:2,000, sc-8334, Santa Cruz 
Biotechnology), S-actin (1:2,000, 8457, Cell Signaling), and anti-GAPDH (1:10,000; 
5174, Cell Signaling). Immunoblots were quantified using Image]. 
Immunofluorescence staining. Serum-deprived primary ear fibroblasts were left 
untreated or stimulated with IGF-1 for 30 min. Fibroblasts or six-day-old intestinal 
organoids were washed three times with PBS and were fixed for 15min at room tem- 
perature in 4% paraformaldehyde, followed by blocking in 10% normal goat serum 
(X090710-8, Dako) supplemented with 0.1% saponin (47036, Sigma). Fibroblasts or 
organoids were incubated with the following antibodies overnight at 4°C: P-AKT 
Ser473 (1:200, 4060, Cell Signaling), P-AKT Thr308 (1:200, 13038); P-S6 Ser235/236 
(1:500, 4856, Cell Signaling), P-4E-BP1 Thr37/46 (1:200, 2855, Cell Signaling). 
Samples were also stained with Alexa Fluor 488 phalloidin (1:500, A12379, 
ThermoFisher Scientific) or Alexa Fluor 488 E-cadherin (1:200, 53-3249-80, 
Affymetrix eBioscience). To analyse mTOR activation, cells were incubated over- 
night at 4°C with antibodies against mTOR (1:200, 2983, Cell Signaling) and 
LAMPI (1:1,000, eBiol1D4B, Affymetrix eBioscience). The secondary antibodies 
used were Alexa Fluor 568-conjugated antibody to rabbit immunoglobulin G 
(1:250; A11036; ThermoFisher Scientific), Alexa Fluor 568-conjugated antibody 
to rat immunoglobulin G (1:250; A11077; ThermoFisher Scientific) and Alexa 
Fluor 488-conjugated antibody to rabbit immunoglobulin G (1:250; A11034; 
ThermoFisher Scientific). Cells and organoids were counterstained in DAPI 
mounting medium (H-1200, Vecta Labs) and images taken with a Nikon C2 
confocal microscope. The average density unit and the Pearson’s correlation 
coefficient were calculated using the digital microscopy imaging software SlideBook 
5 (Intelligent Imaging Innovations). 
Quantitative reverse-transcriptase PCR. RNA was isolated using TRIzol 
(15596026, ThermoFisher Scientific) and converted to cDNA using the High- 
Capacity cDNA Reverse Transcription kit (4368814, Applied Biosystems). 
Gene expression was assessed using the 2x SYBR Green kit (4368706, Applied 
Biosystems) according to the manufacturer’s instructions. Primer sequences are 
listed in the Supplementary Information. 
Cytokine measurement by ELISA. Cytokines in the colon and sera were measured 
by ELISA according to manufacturer’s instructions. IL-18 was measured using an 
ELISA kit (BMS618/3TEN, Affymetrix eBioscience) and all other cytokines were 
measured by a multiplex ELISAs (MCYTOMAG-70K, Millipore). 
Flow cytometry. Colons were dissected, washed with ice-cold PBS and cut into 
small pieces. Colons pieces were incubated with PBS containing 1 mM DTT, 5mM 
EDTA and 10mM HEPES at 37°C for 30 min with gentle shaking to remove the epi- 
thelial layer. The colon segments were further digested in RPMI medium containing 
0.5mg ml! collagenase D at 37°C for 1.5h. The supernatant was passed through 
70-\um cell strainer and enriched by 37.5% Percoll to isolate lamina propria cells. 
The following monoclonal antibodies were used for flow cytometry: CD4 
(RM4-5; 14-0042-85), CD11b (M1/70; 48-0112-82) and CD8a (53-6.7; 48-0081- 
82) from Affymetrix eBioscience, CD19 (6D5; 115512), NK1.1 (PK136; 108708), 
CD11c (N418; 117306), Gr1 (RB6-8C5; 108426) and F4/80 (BM8; 123109) from 
BioLegend. The dilution factor for all antibodies was 1:300. The following gating 
strategies were used: B cells were gated as live cells and CD19*. CD4* T cells were 
gated as live cells, CD4* and CD8. CD8* T cells were gated as live cells, CD8* 
and CD4-. Natural killer cells were gated as live cells and NK1.1*. Macrophages 
were gated as live cells, CD11b*, Gr1!ow-"8, F4/80* and CD11c~. Neutrophils were 
gated as live cells, CD11b* and Gr1", CD11b*CD11c* cells were gated as live cells, 
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CD11b*, Gr1°"-*8, CD11c* and F4/80~. Flow cytometry data were acquired on 
a BD FACSCalibur and analysed using TreeStar FlowJo software. 

T-cell stimulation and intracellular cytokine staining. Splenocytes from wild- 
type and Nirc3~'~ mice were treated with ACK lysis buffer at room tempera- 
ture for 1 min to remove red blood cells. Splenocytes were washed, counted and 
plated at 2 x 10° cells per well in a 96-well plate coated with 1 jg ml“! anti-CD3 
(145-2C11, Affymetrix eBioscience) and 1 jg ml! anti-CD28 (16-0281, Affymetrix 
eBioscience). Cells were cultured at 37°C in the presence and absence of 20ng ml! 
murine IL-2 (212-12, Peprotech) for 4 days. Brefeldin A (00-4506, Affymetrix 
eBioscience) was added to the media for 3h, followed by washing in PBS, and 
staining with anti-CD4 (14-0042-85, Affymetrix eBioscience) and anti-CD3 
(145-2C11, Affymetrix eBioscience) antibodies on ice for 20 min. Stained cells were 
fixed in 1% paraformaldehyde for 30 min on ice and permeabilized using permea- 
bilization buffer (00-8333-56, Affymetrix eBioscience) according to manufacturer's 
instructions. To detect intracellular cytokines, fixed cells were stained with anti- 
IFN (50-7311, Tonbo) and anti-TNF (506322, Biolegend) for 30 min on ice. Flow 
cytometry were performed as described above. 

siRNA knockdown. Primary ear fibroblasts were transfected with a siRNA from 
siGENOME smart pools with the assistance of the Neon Transfection System 
(MPK5000, ThermoFisher Scientific). The siGENOME SMARTpool siRNA 
specific for the gene encoding mouse NLRC3 (M-052823-01, Dharmacon) and a 
control siRNA pool were used. Sequences for siRNA are listed in Supplementary 
Information. After 48h of transfection, cells were stimulated with IGF-1 as 
described above. 

Retroviral transduction. Human or mouse MSCV-NLRC3-IRES-GFP or MSCV- 
IRES-GFP construct was co-transfected with retroviral packaging plasmids 
(pPAM-E and VSV-G) into HEK293T cells using Xfect Transfection Reagents 
(631318, Clontech Laboratories, Inc.). Virus-containing media were collected 
48-72h later and passed through a 0.45-1m filter. Primary ear fibroblasts 
or HCT116 cells were transduced with control or NLRC3-encoding retroviral 
vectors. Cells expressing GFP were selected by flow cytometry. 

Generation of cells expressing NLRC3 and its deletion mutants. Plasmids pVSV¢g, 
pPAM-E and pMIGII encoding the mouse Nirc3 gene or the mouse Nirc3 gene 
lacking regions encoding the CARD, NACHT or LRR domain was transfected into 
1929 cells to generate retroviral stocks. Domains were annotated in accordance 
with the NCBI Conserved Domain (http://www.ncbi.nlm.nih.gov/Structure/cdd/ 
wrpsb.cgi). Retroviral supernatants were collected after 48h of transfection and 
filtered through 0.4-1m filters. L929 cells were infected with the retroviral stocks 
in the presence of polybrene to generate cells that stably expressed either wild-type 
NLRC3 or NLRC3 lacking regions encoding the CARD, NACHT or LRR domain. 
Co-immunoprecipitation. Cells were collected with ice-cold PBS and lysed 
in lysis buffer composed of 50 mM Tris-HCl pH 7.5, 150mM NaCl, 1% NP-40, 
protease and phosphatase inhibitors. Lysates were cleared of insoluble material by 
centrifugation at 15,000g for 10 min. For immunoprecipitation, cell lysates were 
incubated with 3 1g of primary antibodies at 4°C for 12-16h ona rocking platform, 
followed by incubation with Protein A/G PLUS-Agarose (sc-2003, Santa Cruz) 
for a further 2h on a rocking platform. The immunoprecipitated products were 
washed three times with lysis buffer and eluted using 2x SDS sample buffer and 
boiled at 100°C for 5 min. 
Statistical analysis. GraphPad Prism 6.0 software was used for data analysis. 
Data are shown as mean +s.e.m. Statistical significance was determined by t-test 
(two-tailed) for two groups or one-way ANOVA (with multiple comparisons tests) 
for three or more groups. P < 0.05 was considered statistically significant. No 
statistical methods were used to predetermine sample size. 

Data availability. Source data for Figs 3a, d, e, h, 4d-f and Extended Data 
Figs 3b, 5a, 6a, b, d, e, 9a, b, d—g, 10a, b have been provided in Supplementary 
Fig. 1. All other data supporting the findings of this study are available from the 
corresponding author on request. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | NLRC3 prevents colitis-associated colorectal 
tumorigenesis. a, Timeline for azoxymethane (AOM) and DSS treatment. 
b, Relative expression levels of the gene encoding NLRC3 in tumour 

and non-tumour tissue in the colon of wild-type mice 80 days after 
azoxymethane injection. c, Targeting strategy used to generate 

Nirc3~'~ mice and PCR analysis for the gene encoding NLRC3 in 
wild-type C57BL/6 mice, Nirc3*/- mice and Nirc3~'~ mice. The primers 
P1 (which binds a region between exon 1 and exon 2) and P2 (which 
binds a region between exon 3 and exon 4) were designed for ‘PCR’ 

such that it generates a 4,804-bp PCR fragment for the wild-type allele and 
a 2,309-bp fragment for the knockout (KO) allele. However, PCR1 cannot 
differentiate heterozygote (HT) and knockout mice because the knockout 
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2,309-bp fragment outcompetes the wild-type 4,804-bp fragment. 
Therefore, we designed primers P3 and P4 for use in PCR2 to amplify a 
940-bp fragment from exon 3 to confirm its presence in wild-type and 
HET mice and its absence in knockout mice. d, Histological scores of the 
colon tissue in wild-type and Nirc3~'~ mice 80 days after azoxymethane 
injection e, Percentages of mice with dysplasia 80 days after injection of 
azoxymethane. f, Percentages of mice with adenocarcinoma 80 days after 
injection of azoxymethane. g, Haematoxylin and eosin staining of colon 
crypts. Scale bar, 100 1m. Each symbol represents an individual 

mouse (b, d). ****P < 0.0001; NS, not statistically significant by 
one-way ANOVA (b) or two tailed t-test (d). Data represent two 
independent experiments (b, d—g) and are presented as mean + s.e.m. (b, d). 
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Extended Data Figure 2 | NLRC3 dampens intestinal inflammation. tissue of untreated wild-type and Nirc3~'~ mice and in wild-type and 
a, Body-weight change of mice pooled from three independent Nirc3~'~ mice at day 14. h, Levels of IL-6, GCSE, KC and MIP-1a in sera 
experiments. b, Images of colon and colon length in wild-type mice of untreated wild-type and Nirc3~'~ mice and in wild-type and Nirc3~'~ 
and Nirc3~'~ mice at day 14. c, Histological scores at day 14. d, Levels of mice at days 14 and 80. Each symbol represents an individual mouse 
IL-18 and IL-1 in colon tissue at days 14 and 80. e, f, Levels of IL-6, (b-h). **P < 0.01; ****P < 0.0001; NS, not statistically significant by 
TNE, GCSF, KC, MCP-1 and MIP-1a in colon tissue at days 14 and 80. one-way ANOVA (a) or two tailed t-test (b-h). Data represent three 


g, Relative expression of genes encoding IL-6, TNE, GCSF and KCincolon _ independent experiments (a-h) and are presented as mean +s.e.m. in a-h. 
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Extended Data Figure 3 | NLRC3 governs colorectal tumorigenic 
susceptibility via inflammatory mediators and immune cells. 

a, Relative expression of genes encoding IL-17a, levels of the IL-17 protein, 
and relative expression of genes encoding IL-22, IL-23p19, IFNB and 
IFN1 in colon tissue of untreated wild-type and Nirc3~/~ mice and 

in wild-type and Nirc3~'~ mice at day 14. b, Immunoblot analysis of 
phosphorylated and total IxBa (P-IkBa and T-IkBa), ERK1 and ERK2 
(P-ERK1/2 and T-ERK1/2), phosphorylated STAT3 (P-STAT3), and 
B-actin (loading control) in colon tissue of wild-type and Nirc3~/~ mice 
14 days after injection of azoxymethane (top). The protein band intensity 
was normalized to the total protein counterpart and/or (3-actin, and 


expressed relative to that of wild-type controls, set at 1 (bottom). 

c, Gating strategies used to generate data in d. d, Number of macrophages, 
CD11b*CD11c* cells, neutrophils, B cells, CD4* T cells, CD8* T cells and 
natural killer cells per colon in wild-type and Nirc3~'~ mice at days 8 and 
14. e, Splenocytes from wild-type and Nirc3~/~ mice were stimulated with 
CD3, CD28, and IL-2 and intracellular staining was performed for IFNy 
and TNE. *P< 0.05; **P< 0.01; ***P < 0.001; ****P < 0.0001; 

NS, not statistically significant by two-tailed t-test (a, b, d, e). Data pooled 
from two independent experiments (a) or represent one experiment 
representative of two independent experiments (b-e) and are presented 
as mean +s.e.m. in a, b, d, e. 
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Extended Data Figure 4 | The inhibitory effect of NLRC3 is more 
dominant in intestinal epithelial cells than in haematopoietic cells. 

a, Left panel shows colon tumours in, from left to right, wild-type — wild- 
type (n= 10); Nirc3-!- > wild-type (n= 9); wild-type — Nirc3~'~ (n=8); 
Nirc3~'~ — NIrc3~'~ (n= 9) bone-marrow chimeric mice at day 80. 
Percentages of the tumour size of mice are shown on the right. b, Left panel 
shows colon tumours in littermate Nirc3""" (n= 8), LysMNrc3" ufl 
(n=11), Vav1°NIrc3' (n =9), Vil1°NIrc3""" (n =7) and Nirc3~/— 

(n= 8) mice at day 80. The percentages of tumours of each size of each 
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mouse type are shown on the right. c, Relative expression of genes encoding 
LGR5, WNT1, 3-catenin (Ctnnb1), Axin2, TCF4, TCF7, and LEF1 in 
colon tissue of untreated wild-type and Nirc3~/~ mice or in wild-type and 
Nirc3~'~ mice at day 14. d, Immunohistochemical staining of 3-catenin in 
colon tissue of wild-type and Nirc3~'~ mice. Scale bar, 201m. Each symbol 
represents one mouse (c). NS, not statistically significant by two-tailed 
t-test. Data represent two independent experiments and are presented as 
mean + $.e.m. in. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a Day 8 
WT Nirc3 3) — 4, — 4, — 
Mouse Legos 4 6 2S a OS x > 3 x 3 
am mie ee fe B 9 a) 
P-mTOR D> ae oO Se SE ea E, 4 ul 
)} a 
P-S6 > a | eae 1 a 4 
i 0 0 0 
P-4E-BP1 >| ? 34 iis Wt wr 
ne a ee 15 Bi Nis" 
(S473) ' 5 2 * c 
P-1K Bor D> semen sa me rel ee ee g i 1.0 
I 1 a Os 
t a . 
GAPDH ' 0 0.0 
b 
= = = NS = NS 
& s © 50 S 
o ° 
g 4 2 40 5 
2 E P 30 e 
= g B 20 2 
oO oo. © 10 rm 
= 2 = 0 ‘= 
= a = 
5 § 250 82 8 
5 — — 
: 200 - = 
£ @ 150 > 1 £ 
2 ® 100 a4 g 
LL — Lm to] 
7) 50 1 = 
S) ae 3) a 
oO = = 


c 
= _ 50 _ 
é E 40 E 
£ 2 30 2 
- = 20 2 
= salle | - 


MIP-10t (pg/ml) 


Extended Data Figure 5 | Dysregulation of mTOR signalling precedes IL-1, IL-6, TNE, GCSE, KC, MCP-1 and MIP-1a in colon tissue at day 8. 
dysregulation of NF-«B signalling. a, Immunoblot analysis of c, Levels of IL-18, IL-18, IL-6, TNE, GCSE, KC, MCP-1 and MIP-1a in sera. 
phosphorylated mTOR, S6, 4E-BP1, AKT and IkBa, and GAPDH (loading _ Each symbol represents an individual mouse (b, c). *P < 0.05; **P < 0.01; 
control) in colon tissue of wild-type and Nirc3~/~ mice at day 8 (left). The NS, not statistically significant by two-tailed t-test (a-c). Data represent 
protein band intensity was normalized to GAPDH and expressed as alevel _ two independent experiments and are presented as mean + s.e.m. in a-c. 
relative to that of the wild-type controls, set at 1 (right). b, Levels of IL-18, For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 6 | NLRC3 regulates mTOR activity. 

a, Immunoblot analysis of phosphorylated AKT, total AKT, 
phosphorylated 4E-BP1 and GAPDH (loading control) in the colon 
tissue of wild-type, Nirc3*'~ and Nirc3~/~ mice at day 14. The protein- 
band intensity was normalized to the total protein counterpart and/or the 
loading control and expressed relative to that of wild-type controls, 

set at 1 (right). b, Immunoblot analysis of phosphorylated AKT, S6K, 

S6 and 4E-BP1, and GAPDH (loading control) in wild-type fibroblasts 
transfected with a control siRNA or Nirc3 siRNA left untreated or treated 
with IGF-1 (top). Densitometry analysis as in a over 120 min (bottom). 


c, Relative expression of the gene encoding NLRC3 in wild-type fibroblasts 


transfected with a control siRNA or compared with wild-type fibroblasts 
transfected with an Nirc3 siRNA. d, Immunoblotting of phosphorylated 
S6K, S6, 4E-BP1, AKT, and total AKT in primary fibroblasts either left 


untreated or treated with IGF- 


1 (top). Densitometry analysis as ina 


(bottom). e, Immunoblotting of phosphorylated AKT and mTOR, and 
8-actin (loading control) in primary fibroblasts either left untreated or 
treated with IGF-1 (left). Densitometry analysis as in b (right). *P < 0.05; 
**P << 0.01; ***P < 0.001 by one-way ANOVA (a, b, d, e). Data are 

from one experiment representative of two (a, c) or four independent 
experiments (b, d and e) and are presented as mean + s.e.m. ina, b, d, e. 
For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 7 | NLRC3 regulates mTOR activity in is shown in the right panels. ADU, average density unit. Scale bar, 


fibroblasts. a-d, Immunofluorescent stain of phosphorylated S6 (a), 20 sm. Each symbol represents an individual cell. **** P< 0.0001; 
4E-BP1 (b), AKT (c), and AKT (d) in primary fibroblasts either left NS, not statistically significant by two-tailed t-test. Data represent one 
untreated or treated with IGF-1 for 30 min is shown in the left panels. experiment representative of two independent experiments and are 
Quantification of the fluorescence intensity in each cell (n = 150 or more) presented as mean +s.e.m. 
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Extended Data Figure 8 | NLRC3 prevents colorectal cancer in an 
ApcMin! * model of tumorigenesis. a, Images of colon tumours (left), 
tumour number and colon length (middle), and size (right) of 120-day- 
old littermate ApcM’* and ApcM'* Nirc3-'~ mice. b, Percentage of mice 
with dysplasia (left), total histology scores (middle), and histology scores 
of different parts of colon and different parameters (right) of mice in a. 

c, Haematoxylin and eosin (H&E, top), Ki67 (middle) and phosphorylated S6 
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**P < 0.0001; NS, not statistically significant by two-tailed t-test 
(a, b, d). Data represent two independent experiments and are presented 
as mean +s.e.m. ina, b, d. 
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Extended Data Figure 9 | NLRC3 disrupts the assembly of the PI3K 
heterodimeric complex. a, b, Immunoprecipitation and comparative 
analysis of the PI3K signalling complex levels between wild-type and 
Nirc3~'~ primary mouse embryonic fibroblasts (MEFs; a) and 
BMDMs (b). c, Schematic representation of the generation of deletion 
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mutants of NLRC3. d, Loading inputs for e-g. e, Immunoprecipitation of 
wild-type-NLRC3 and its deletion mutants. e, f, Immunoblotting analysis 
of the interaction between NLRC3 and its mutants with the PI3K-p110 (f) 
and PI3K-p85 (g) subunit. Data represent two independent experiments. 
For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 10 | NLRC3 negatively regulates TLR4-induced 
activation of the PI3K-AKT-mTOR pathway. a, Immunoblot analysis 
of phosphorylated AKT (at Thr308 and Ser473), total AKT and 8-actin 
(loading control) in wild-type and Nirc3~'~ bone-marrow-derived 
macrophages (BMDMs) left untreated or treated with LPS (top). The 
protein-band intensity was normalized to B-actin, and expressed relative 
to that of wild-type controls, set at 1 (bottom). b, Immunoblot analysis of 
phosphorylated mTOR, phosphorylated 4E-BP1, and 8-actin 


(loading control) in wild-type and Nirc3-'~ bone-marrow-derived 
macrophages (BMDMs) left untreated or treated with LPS (top). 
Densitometry analysis as in a (bottom). c, A model of the role of NLRC3 
in the negative regulation of the PI3K-AKT-mTOR pathway. 

* P<0.05; ** P< 0.01; *** P< 0.001 by two-tailed t-test (a, b). Data are 
from one experiment representative of four independent experiments 
and are presented as mean + s.e.m. in a and b. For gel source data, 

see Supplementary Fig. 1. 
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Mechanism of early dissemination and metastasis in 


Her2* mammary cancer 


Kathryn L. Harper!*, Maria Soledad Sosa!*+, David Entenberg*, Hedayatollah Hosseini*, Julie F. Cheung!, Rita Nobre', 
Alvaro Avivar-Valderas!, Chandandaneep Nagi’, Nomeda Girnius’, Roger J. Davis*, Eduardo F. F arias!, John Condeelis?, 


Christoph A. Klein** & Julio A. Aguirre-Ghiso! 


Metastasis is the leading cause of cancer-related deaths; metastatic 
lesions develop from disseminated cancer cells (DCCs) that can 
remain dormant!. Metastasis-initiating cells are thought to originate 
from a subpopulation present in progressed, invasive tumours”. 
However, DCCs detected in patients before the manifestation of 
breast-cancer metastasis contain fewer genetic abnormalities than 
primary tumours or than DCCs from patients with metastases*>. 
These findings, and those in pancreatic cancer® and melanoma’ 
models, indicate that dissemination might occur during the early 
stages of tumour evolution**°. However, the mechanisms that might 
allow early disseminated cancer cells (eDCCs) to complete all steps 
of metastasis are unknown’. Here we show that, in early lesions in 
mice and before any apparent primary tumour masses are detected, 
there is a sub-population of Her2+p-p38'°p-Atf2!Twist1E-cad° 
early cancer cells that is invasive and can spread to target organs. 
Intra-vital imaging and organoid studies of early lesions showed 
that Her2* eDCC precursors invaded locally, intravasated and 
lodged in target organs. Her2t eDCCs activated a Wnt-dependent 
epithelial-mesenchymal transition (EMT)-like dissemination 
program but without complete loss of the epithelial phenotype, 
which was reversed by Her2 or Wnt inhibition. Notably, although 
the majority of eDCCs were Twist1'E-cad!° and dormant, they 
eventually initiated metastasis. Our work identifies a mechanism for 
early dissemination in which Her2 aberrantly activates a program 
similar to mammary ductal branching that generates eDCCs that 
are capable of forming metastasis after a dormancy phase. 

We investigated whether the loss of tumour-suppressive p38 sig- 
nalling and gain of oncogenic Her2 upregulation, which induce 
resistance to anoikis (apoptosis induced by lack of correct cell- 
extracellular matrix attachment) in early cancer cells!®, might 
activate a disseminating phenotype. eDCCs were defined as those 
originating at times when the MMTV-Her2 (Her2 under the con- 
trol of mouse mammary tumour virus) mice had normal ductal 
architecture, hyperplasia and mammary intraepithelial neoplasia, 
as confirmed by histopathology'', but no detectable tumours® 
(Extended Data Fig. la-c). Her2 causes E-cadherin (E-cad, also 
known as cadherin 1) downregulation”, whereas p38 can maintain 
E-cadherin expression’’. We found that more than 85% of Her2+ 
cells were E-cad!° (Fig. 1a), and in Her2* (wild-type gene) and 
Her2-T* (mutant active gene) tissues of early lesions, E-cad™ early 
cancer cells were more frequently phosphorylated (70-75%) (p-) 
ATF2bi (Fig. 1b and Extended Data Fig. 1d, e). In each duct, 60-70% 
of all cells from early lesions were positive for membrane ($-catenin 
(the inactive form) (Extended Data Fig. 2a). However, when analysing 
Her2™ cells alone, only 30% of cells showed membrane localization 


for 3-catenin (Fig. 1c). Overall, these results suggest that Her2* cells 
display a loss of E-cadherin- and }-catenin-based junctions and are 
p-ATE2". 

Overt MMTV-Her2 tumours showed low levels of E-cadherin, phos- 
phorylated p38 (p-p38) and p-ATF2, while maintaining high p-ERK1/2 
(p-ERK1 and p-ERK2) levels (Extended Data Fig. 2b-d), suggesting 
that a Her2+ p-ATF2!°E-cad"° profile is present in early lesions and pri- 
mary tumours. We found that only HER2~ human ductal carcinoma 
in situ (DCIS) lesions retained both high p-ATF2 expression levels and 
organized E-cadherin junctions, whereas HER2* DCIS lesions showed 
low expression levels for p-ATF2 and E-cadherin (Extended Data 
Fig. 2e, f). The HER2*p-p38"p-ATF2" profile was also present in 
larger human HER2* breast carcinomas, whereas only HER2~ tumours 
showed strong nuclear staining for p-p38 and p-ATF2 compared to 
HER2? lesions (Extended Data Fig. 2g). We conclude that, in early 
human and mouse cancer cells, HER2 upregulation is associated with 
a p-p38!°p-ATF2'°E-cad" profile that persists in large primary tumours. 

The above data suggest that p38 might prevent an invasive phe- 
notype and therefore Her2+E-cad'°p-p38!p-ATF2" cells from early 
lesions might be able to disseminate. MMT V-Her2 early lesions and 
MCFIOA (a human mammary gland cell line) cells overexpressing 
HER2 (denoted as MCF10A-HER2) grown in 3D-organoid cultures 
show misshapen architecture and occasional single-cell invasion from 
the organoids (Extended Data Fig. 3a, b). Treatment of MCF10A-HER2 
cells (Extended Data Fig. 3c, d) or MMT V-Her2 organoids (Extended 
Data Fig. 3e) with lapatinib, which reduced p-S6 levels through inhibi- 
tion of epidermal growth factor receptor (EGFR) and Her2 (Extended 
Data Fig. 3d), or with HER2 (also known as ERBB2) short interfering 
RNAs (siRNAs) restored E-cadherin junctions and increased p-ATF2 
levels (Extended Data Fig. 3c, e). Inhibition of EGFR signalling to AKT 
using AG1478 also increased the number of E-cadherin junctions and 
ATE2 phosphorylation (Extended Data Fig. 3c-e). Further, a pan-PI3K 
inhibitor (GDC-0941) or an AKT inhibitor (MK2266) that reduced 
p-S6 levels also increased p-ATF2 levels (Extended Data Fig. 3d, e). 
Basal and lapatinib-stimulated nuclear p-ATF2 levels were completely 
eliminated by treatment with SB203580 (Extended Data Fig. 3f) in 
MCF10A-HER2 cells. We conclude that Her2 and EGFR signalling 
through PI3K and AKT inhibit p38 activity in cells from early lesions, 
and that increased p-ATF2 levels after HER2 and EGER inhibition 
depend on p38a and p388 (p38a/a (Extended Data Fig. 3a-f). 

MMTV-Her2 and MCF10A-HER2 organoids showed outward 
invading single cells as observed using live microscopy (Supplementary 
Video 1 and Fig. 1d, e), which displayed projections rich in F-actin, 
loss of E-cadherin junctions and focalized loss of laminin-V deposi- 
tion (Fig. le and Extended Data Fig. 3g). Her2* invading cells from 
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Figure 1 | E-cadherin, Her2 and p-ATF2 levels and function in 

early lesion cells. a, Top, MMTV-Her2 early lesion tissue sections 
co-stained for the indicated antigens. Bottom, intra-ductal heterogeneity 
of Her2™E-cad’ cells. Inset, magnified view of the boxed region. 
Arrowheads, Her2'E-cad cells; arrows, Her2!°E-cad" cells. Right, 
percentage of Her2" cells that were E-cad!” or E-cad'8 (n = 20 ducts; 
n=2 mice). ***P < 0.0001. b, Representative images of E-cad?'p-ATF2™ 
(top) and E-cad!°p-ATEF2" (bottom) ducts in MMTV-Her2 early lesion 
tissues. Arrowhead, p-ATF2™E-cad"' cell, arrow, p-ATE2'°E-cad" cell. 
Scale bars, 25j1m and 101m (inset) (a and b). c, MMTV-Her2 early 
lesion tissue sections stained for Her2 and 8-catenin. Arrow, Her2* cells 
with low membrane-associated 3-catenin (B-catMEMle), arrowhead, 
Her2~B-cat™2M hi cells. Scale bar, 101m. Graph, percentage of early lesion 
cells with B-cat“=™ that were Her2~ or Her2* (Her2, *P=0.035 and 


MMTV-Her2 early lesions also showed downregulated E-cadherin 
in vivo (Fig. 1f). Notably, approximately 80% of E-cad"° invading cells 
were Her2* and positive for cytokeratin 8 and 18 (CK8/18*), suggest- 
ing that epithelial identity was retained despite E-cadherin junction 
loss; only about 20% of CK8/18* invading cells were negative for 
Her2 (Extended Data Fig. 3h). When we inhibited p38a/8 activity in 
MMTV-Her2* and MCF10A-HER2 organoids using SB203580, we 
found that it stimulated an invasive phenotype (Fig. 1d, e and Extended 
Data Fig. 3g). We conclude that Her2*E-cad!p-p38"° early cancer cells 
display an invasive phenotype without the loss of CK8/18 expression. 

We next monitored early dissemination using high-resolution 
intra-vital imaging’ of MMTV-Her2-cyan fluorescent protein (CFP) 
transgenic mice (Fig. 2). Using a new mammary imaging window and 
two-photon imaging in vehicle-treated Her2-CFP mice, we found that 
although no invasion was detected at 10 weeks (normal ductal structure) 
(Fig. 2a, left and Supplementary Video 2), at 15 and 18 weeks 
local invasive CFP* cells were detected invading into the stroma 
(Fig. 2a, middle, right and Supplementary Videos 3, 4). CFP reports 
faithfully for Her2-overexpressing cells, which were confirmed using 
double Her2 and CFP immunofluorescence (more than 90% of Her2* 
cells were CFP*; Extended Data Fig. 4a). p38a/8 inhibition with 
$B203580 stimulated invasion and CFP* cells were now found intra- 
vasating (Fig. 2b-d and Supplementary Videos 5-7). 3D reconstruction 
of the videos showed unambiguously how individual cells from the 
ductal early lesion tissues entered the lumen of blood vessels (Fig. 2b 
(inset), c, d and Supplementary Videos 7, 8). 

The intravasation documented in the videos led to successful 
dissemination, since we found Her2*CK8/18+t eDCCs in the blood, 
bone marrow and Her2* eDCCs in lungs of 14-18-week-old mice 
(Extended Data Fig. 4b-j and Extended Data Table 1). Her2 detec- 
tion by immunohistochemistry or immunofluorescence microscopy 
using two independent antibodies (mouse and rabbit) showed similar 
patterns of Her2 staining that were absent with the pre-immune IgG 
or non-oncogene expressing FVB mice (Extended Data Fig. 4c-g). 
Systemic inhibition of p38a/8 for 2 weeks also substantially increased 
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Her2-T, *P=0.0008). MMT V-Her2 n = 30 ducts per mouse, n =3 mice; 
MMTV-Her2-T n= 10 ducts per mouse, n =2 mice. d, MMTV-Her2 
early lesion organoids treated for 48 h with DMSO or 51M $B203580. 
BE, bright field. Bottom imagees indicate magnified, boxed regions. Scale 
bars, 15 jm (left), 40 um (right). e, MMTV-Her2 organoids stained for the 
indicated antigens. Arrowheads, invasive cells; arrow, intact laminin-V 
layer. Scale bars, 251m (e), 101m (insets 1 and 2). f, MMTV-Her2 

early lesion sections co-stained for the indicated antigens. Arrows, 
Her2+E-cad'° invading cells. Top numbers, percentage of Her2* 

E-cad!° invading cells in early lesion sections, n = 58 cells per section, 
n=2 mice. Graph, percentage E-cad!° invading cells in MCF10A-HER2 
(10A-Her2; see also Extended Data Fig. 3a) or MMT V-Her2 organoids. 
n=20 MCF10A-HER2 organoids. a, c, one-sided Mann-Whitney U-test. 
Data are median +s.e.m. 


the numbers of early circulating cancer cells in blood (Extended 
Data Fig. 4h) and eDCCs in bone marrow and lungs (Extended Data 
Fig. 4i, j). 

Treatment of MMT V-Her2 or MCF10A-HER2 early lesion orga- 
noids with $B203580 or siRNAs targeting p38a caused a loss of 
E-cadherin junctions (Fig. 3a, h (top) and Extended Data Fig. 5a-c). 
Genetic and pharmacological p38 inhibition reduced total membrane- 
localized inactive 3-catenin and increased active-3-catenin (unphos- 
phorylated 3-catenin, detected with a conformation-specific antibody’) 
(Fig. 3b and Extended Data Fig. 5a, b, d, e). Inhibition of p38a/8 in 
MCF10A-HER2 cells led to AXIN2 mRNA upregulation, a canonical 
target of B-catenin’® (Extended Data Fig. 5f). SNAIJ (also known as 
SNAIL) and TWISTI were also upregulated in MCF10A-HER2 3D 
organoids treated with siRNAs targeting p38a (also known as MAPK14) 
or ATF2 (Extended Data Fig. 5g). Notably, systemic p38a/@ inhibition 
in MMTV-Her2 mice induced a strong loss of E-cadherin junctions, 
a concomitant increase of nuclear 3-catenin and a strong induction of 
Twist1 in early lesion tissues as detected by immunohistochemistry 
and quantitative PCR (Fig. 3c—-f, Extended Data Fig. 5h, i). In wild-type 
FvB mice or MCF10A organoids treated with or without SB203580 
or in wild-type compared to Mkk3~'~ Mkk6*!~ (also known as 
Map2k3~/~ Map2k6*/~) C57BL/6 mice (Mkk3 and Mkké6 activate all 
p38 isoforms), p38 inhibition caused a loss of E-cadherin junctions 
(Extended Data Fig. 5j-l), but did not cause a loss of membrane 
B-catenin, CK8/18 expression or a-smooth-muscle-actin-positive 
myoepithelial cell organization (Extended Data Fig. 5m). Our data 
suggest that p38a and ATF2 inhibit the activity of 3-catenin, prevent- 
ing successful dissemination of Her2* early lesion cells. Our data also 
suggest that in the absence of Her2 expression, p38 inhibition was 
insufficient to change 3-catenin localization. 

HER2 expression or p38a/ inhibition alone induced a 14-gene 
EMT signature that included non-canonical WNT ligands and EMT 
transcription factors and was further upregulated by p38 inhibition in 
MCF10A-HER2 cells (Fig. 3g and Extended Data Fig. 6a, b). CDH1 
mRNA was also downregulated by p38 inhibition in MCF10A-HER2 
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Figure 2 | Intra-vital imaging of eDCC precursor intravasation. 
a, Her2-CFP (blue) ducts from 10- (left, Supplementary Video 2), 

5- (middle, Supplementary Video 3) and 18- (right, Supplementary 
Video 4) week-old early lesions; scale bars, 38, 5 and 561m, respectively. 
Red, rhodamine-dextran-labelled vasculature. Dotted ellipses, ducts. 
Right inset: direction of movement of an eDCC at 5 time points (2 min per 
panel). Dotted arrow, direction of movement; scale bars in insets, 7.8 1m. 
b, Her2-CFP early lesion ducts from Supplementary Video 5 and 
Supplementary Video 6 (inset) treated with SB203580 for 2 weeks. Middle 
image, Higher resolution intra-vital video of the area boxed in the left 
image. Scale bar, 25.5 1m and 5.2 1m (boxed area). c, Top, 3D computer 
reconstruction of the video in b (boxed area). Bottom, rotated projection 
showing invasion (yellow) of early lesion cells (CFP, cyan) into blood 
vessels (red) (Supplementary Video 7). Scale bars, 7 {1m (top) and 10 jm 
(bottom). d, Sequence following an early lesion cell (Supplementary Video 8) 
as it intravasates in mice treated with SB203580 for 4 weeks (yellow cell 
inside the red blood vessel (BV); 2.2 min per panel). 


organoids (Extended Data Fig. 6c). When we determined whether 
WNT ligands were functionally linked to the HER2-driven EMT-like 
program, we found that MCF10A-HER2 cells treated with SB203580 
displayed AXIN2 mRNA induction, which was reversed by overexpres- 
sion of the WNT-ligand antagonist SFRP1 (Extended Data Fig. 6d, e). 
Recombinant soluble WNT3A (ref. 17 and Methods) also stimulated 
expression of AXIN2 in MCF10A-HER2? cells, and this induction was 
substantially inhibited by expression of a constitutively active p38a 
kinase (p38c.P176A,F3278 mutant) cDNA!° (Extended Data Fig. 6f). 
Increased invasion, loss of E-cadherin junctions and membrane- 
localized 3-catenin after p380/6 inhibition were reversed in SFRP1- 
expressing cells (Extended Data Fig. 6g, h). In addition, the canonical 
Wnt inhibitor DKK] also reversed the loss of E-cadherin induced by 
p38 inhibition in MMTV-Her2 primary organoids (Fig. 3h). Twist1 
was also readily detected as a nuclear-cluster signal using immunoflu- 
orescence microscopy in both early lesions and primary tissue from 
primary tumours in the MMT V-Her2 model (Extended Data Fig. 6i). 
Analysis of these tissues showed that Twistl was expressed in the 
majority of early lesion cells and that no major changes in the expres- 
sion of Twist1 occurred with progression to primary tumours. Overall, 
our data suggest that Her2 and p38 operate antagonistically and that 
Her2* p-p38" cells from early lesions may rely on both canonical and 
non-canonical Wnt ligands to induce an EMT-like program associated 
with TWIST1 upregulation. 

The majority of solitary Her2* eDCCs (1-5 cells) in lungs were negative 
or weakly stained for the G1 exit marker p-Rb (p-retinoblastoma 
protein) compared to growing micro and macro-metastases in 
mice carrying primary tumours (Fig. 4a, b, d). Detection of p-Ser10 
histone-H3 (a G2/M marker) also showed that most eDCCs were 
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Figure 3 | An EMT-like program in Her2* early lesion cells. 

a, E-cadherin staining in early lesion MMT V-Her2 organoids transfected 
with control or p38a-targeting siRNA. Scale bars, 25 1m. Graph, 
percentage E-cad™ organoids; n = 30 organoids per treatment; *P = 0.038. 
b, Early lesion MMT V-Her2 organoids treated for 48 h with SB203580 and 
stained for total 8-catenin and the nuclei were counterstained with DAPI. 
Grey-scale panels (1, 2) denote zoomed images of boxed areas showing 
membrane- and cytosolic-localized 3-catenin. Scale bars, 50,1m. Graph, 
percentage 3-catM™M* organoids; n = 10 organoids per treatment; 
*P=0.034. c, Early lesion tissue sections from MMTV-Her2 mice treated 
for 2 weeks with SB203580 stained for E-cadherin and (3-catenin. Scale 
bars, 20}1m (top) and 10|1m (bottom). Arrows, E-cadherin (top left) 

and lack of signal (top right), membrane-bound 8-catenin (bottom, left, 
arrows) or nuclear 8-catenin (bottom, right, arrows). d, e, Quantification 
of images in c. C, control. SB, SB203580. d, Percentage of E-cad"i ducts. 
n=3 animals; *P=0.028. e, Percentage of nuclear B-cat cells per 

duct. n = 36 ducts per 3 animals; ***P=0.0002. f, Twist] mRNA levels 

in MMTV-Her2 early lesion tissues obtained from mice treated as 

in c. Values, fold change over control (DMSO) normalized to Gapdh. 

n=3 mice per treatment; **P =0.0042. g, Heat map of EMT-related genes 
upregulated >2-fold (biological triplicate) in MCF10A and MCF10A- 
HER2 organoids treated for 6 days with or without $B203580 (51M). 
Green, control values set to 1; red, fold change over control. h, Early lesion 
MMTV-Her2 organoids were treated for 2 days with $B203580 (51M) as 
well as 500 ng ml~! DKK1 and stained for E-cadherin. Bottom numbers, 
percentage of E-cad" organoids; n = 10 organoids per treatment, biological 
duplicates. *P < 0.01 (DMSO —DKK versus $B203580 —DKK and $B203580 
—DKK1 versus $B203580 +DKK1); not significant (DMSO —DKK1 
versus DMSO +DKK1; DMSO +DKK1 versus +DKK1 and $B203580). 
Scale bars, 25 1m. a, b, f, h, One-sided unpaired t-test; d, e, One-sided 
Mann-Whitney U-test. Data are mean + s.e.m. 


non-proliferative (Fig. 4d and Extended Data Fig. 6j). DCCs found in 
animals bearing overt tumours were termed DCCs because we could 
not distinguish their early lesion or primary tumour origin. Animals 
with overt tumours that were p-Rb*"°"8 (Extended Data Fig. 6k) and 
lungs bearing proliferative (p-Rb*°"8) micro- (6-50 Her2* cells) and 
macro- (more than 50 Her2* cells) metastases (Fig. 4b) still had numer- 
ous quiescent p-Rb- Her2* DCCs (Fig. 4a, d). We conclude that eDCCs 
are primarily p-Rb p-H3~ and that even in animals bearing metastases, 
more than 60% of single or less than 5-cell clusters are non-proliferative. 
Notably, the vast majority of solitary Her2+ eDCCs in lungs were 
negative for E-cadherin (Fig. 4e). However, Her2* DCCs (in animals 
with overt primary tumours) were positive for E-cadherin in approxi- 
mately 48% of the population (Fig. 4e). Also, close to 100% of the 
solitary Her2* eDCCs showed high Twist1 expression, whereas only 
30% of solitary Her2* DCCs in animals with primary tumours were 
positive for Twist] (Fig. 4fand Extended Data Fig. 61). We conclude that 
most Her2* eDCCs are quiescent and upregulate Twist1, whereas more 
than half of the DCCs during primary tumour stages downregulate 
Twistl expression, suggesting that reactivation might be linked to a 


mesenchymal-epithelial transition’®. 
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Figure 4 | eDCC characterization and metastatic potential. a, Her2, 
p-Rb and DAPI immunofluorescence detection in eDCCs and DCCs in 
lung sections from MMTV-Her2 mice. Scale bars, 10 1m. b, Her2 and 
p-Rb levels in spontaneous metastasis in MMTV-Her2 mice carrying 
autochthonous primary tumours. Scale bars, 101m. c, Her2, p-Rb 

and DAPI detection in micro-metastasis generated from early lesion 
mammospheres. Scale bars, 10 um. d, Left, percentage of positive or 
negative p-Rb or p-Ser10 histone 3(H3) and Her2* solitary eDCCs or 
DCCs; n = 3 animals per group; mean + s.e.m.; **P = 0.0087; one-sided, 
unpaired t-test. Right, Percentage of positive or negative p-Rb cells within 
spontaneous macro-metastases (n = 3 lesions) in Her2 mice carrying 
overt autochthonous primary tumours (PT) or in metastases derived 
from MMTV-Her2 early lesions mammospheres (MS) (n= 9 micro-, 
n=3 macro-metastasis). *P = 0.021, one-sided, unpaired t-test; 

mean +s.d. from technical replicates. e, Her2, E-cadherin and DAPI 
signal in eDCCs or DCCs; bottom numbers, percentage of E-cad™ (top) 
or E-cadt (bottom) DCCs, n> 100 DCCs from 3 mice. Scale bars, 541m. 
f, Left, Her2+ (H*) and Twist1* (T+) eDCCs; right, Her2*Twist1~ (T~) 
DCCs. Graph, percentage of cells with the indicated profiles. n = 500 cells; 
4 animals per group; **P < 0.05; one-sided Mann-Whitney U-test; 
median and values from individual animals. Scale bars, 101m. 

g, Haematoxylin and eosin staining of lung macro-metastasis produced 
by MMTV-Her2 mammospheres (MS) or tumourspheres (TS). Scale 
bars, 150\1m. Graph, primary tumour (PT) and metastasis (M) percentage 
incidence of early lesions, mammospheres and primary tumourspheres. 
n= 15 (MS) and 13 (TS) animals. 


The treatment of MMT V-HER2 mice with p38 inhibitors during 
early lesion stages for 2 weeks did not stimulate the dormant eDCCs 
to proliferate in the lungs as measured by p-Rb staining (Extended 
Data Fig. 6m). However, if p38 was inhibited for 2 weeks during overt 
primary tumour stages, DCCs responded to the p38 inhibitor and the 
proportion of p-Rb* cells was increased (Extended Data Fig. 6m). 
These data suggest that, in agreement with Twist1 upregulation in 
response to p38 inhibition, the non-proliferative state of Twist 1" 
eDCCs is independent of p38a/8 activation. However, DCCs in 
animals with overt tumours regain sensitivity to p380/( inhibition as 
observed in other models)’. 

We next analysed the tumour- and metastasis-initiating capacity 
of pure cancer cells from early lesions. We prepared sphere cultures 
from MMTV-Her2 early lesions or from overt primary tumour tissues 
(Extended Data Fig. 7a) and found that early lesion mammospheres 
were more efficient at generating secondary spheres than primary 
tumoursphere cells (Extended Data Fig. 7b). However, after ortho- 
topic injection into nude mouse mammary fat pads (approximately 
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300 spheres per site), 100% of primary tumourspheres formed pri- 
mary tumours within 4-12 weeks (Fig. 4g), whereas Her2* early lesion 
mammospheres did not produce obvious growing tumours (Fig. 4g 
and Extended Data Fig. 8); in three animals Her2~ early lesion mam- 
mospheres formed small nodules at two months but these entered 
stasis. The tumorigenic capacity of primary tumours over early lesion 
spheres correlated with enhanced p-ERK1/2 and p-S6 levels in the 
former (Extended Data Figs 7d, 8). Notably, Her2+ early lesion mam- 
mospheres produced lung metastases that were Her2* and p-Rb*"°"8, 
confirming their early lesion MMTV-Her2 origin and that they can 
proliferate (Fig. 4c, d, g and Extended Data Fig. 7c). Spontaneous macro- 
metastases in MMTV-Her2 mice bearing autochthonous tumours 
were Her2* and p-Rb* (Fig. 4b, d) and macro-metastases derived from 
MMTV-Her2 early lesion mammospheres were also positive for p-Rb 
(Fig. 4c, d). Her2* tumourspheres produced Her2* DCCs (Extended 
Data Fig. 7c) and also showed macro-metastasis but with a lower 
incidence (around 16%) (Fig. 4g). 

Mammospheres, which were re-implanted in 3D matrices and 
subsequently imaged, were substantially more invasive than tumour- 
spheres that remained globular (Extended Data Fig. 7e). We also found 
that experimental metastasis incidence from single-cell suspensions 
was 100% in both groups and no significant difference was found in 
the ability of early lesion or primary tumour cancer cells to produce 
metastatic nodules (Extended Data Fig. 7f). Our data suggest that 
Her2" early lesion cells that downregulate p38 signalling and activate 
an EMT-like response while largely non-tumorigenic in orthotopic 
sites, successfully activate invasive programs that allow for efficient 
dissemination and metastasis formation. The latter does not seem to 
only depend on the ability to colonize lungs (Fig. 4c, g and Extended 
Data Fig. 7c, f). 

Our findings identify a molecular mechanism of early dissemi- 
nation. We propose that a subpopulation of Her2*CK8/18*t Wnt"p- 
p38!°Twist1™E-cad" early cancer cells can disseminate and metastasize 
(Extended Data Fig. 9). Early lesion cells are more invasive than primary- 
tumour-derived cells, display more stemness and can intravasate and 
lodge in secondary organs. Notably, in addition to the work presented 
here, Hosseini et al.*! also show that eDCCs have metastasis-initiating 
capacity, which was not associated with high tumorigenic capacity or 
enhanced lung-colonizing potential. Analysis of ERK1/2 and AKT- 
mTOR pathways showed low activation of these pathways in early 
lesion cancer cells, suggesting they are not in a ‘growth mode’ yet. This 
suggests a new function for Her2, in which, prior to stimulating growth 
in early lesions, it activates programs of stemness, motility and invasion 
similar to those observed during branching morphogenesis” that 
support early dissemination and metastasis. 

A few studies have shown that early dissemination occurs in mammary- 
cancer®, pancreatic-cancer® and melanoma’ models and in patients*1, 
suggesting that our findings are not a rarity of the Her2 model. The 
majority (around 98%) of Her2* single eDCCs, once in the lungs, were 
Her2*Twist1™ but E-cad!° and predominantly dormant. This suggests 
that eDCC dormancy might be linked to an EMT program, as proposed 
by others”*?, However, eDCCs still expressed CK8/18*, suggesting 
that a partial EMT is sufficient for early dissemination, dormancy and 
metastasis formation. This EMT program in eDCCs may allow cells 
to interconvert between dormant and proliferative states as transient 
Twist1 expression induces stem-cell programs”! while a full EMT 
may block metastasis***°. Whether changes in Twist] and E-cadherin 
expression control the reactivation in lungs remains unknown. 

Our work also reveals an unexpected role for p38c/8 kinases”° and 
ATF2 (ref. 27) in antagonizing Her2 signalling early in cancer pro- 
gression. E-cadherin junction regulation by p38-mediated’? and/or 
ATF2-mediated blockade of 3-catenin activity”, may explain how these 
proteins block early dissemination. eDCC precursors showed low p38 
activation and eDCCs were not stimulated to proliferate after systemic 
p38a/8 inhibition, which stimulated expansion of DCCs in other 
models!. Thus, eDCC dormancy seems to be p38a/8 independent. 


22/29 DECEMBER 2016 | VOL 540 | NATURE | 591 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


By contrast, p38 appeared to regulate DCC dormancy during late stages 
of progression as shown in other models'. The mechanism behind 
this p38 switch between eDCCs and DCCs remains unknown. The 
data in the study by Hosseini et al.*', using genetic lineage analysis, 
showed that a large proportion of metastases are derived from eDCC 
ancestors. Thus, understanding the differences between eDCCs and 
DCCs is important to better adapt the targeting of these different DCC 
populations. 

Our findings change our understanding of how certain oncogenes 
may initiate dissemination before triggering aggressive proliferation 
and how tumour-suppressor pathways might suppress metastasis, a 
function that has been attributed to p38 (ref. 1), but was never linked 
to the early dissemination process. The related study*! and our work 
may also open doors to explain phenomena like metastases in cancer 
of unknown primary origin” and in patients with DCIS that never 
developed any local recurrence*’. We may also be able to understand 
how eDCCs found metastasis directly and/or through the preparation 
of eDCC-mediated pre-metastatic niches for later arriving DCCs to 
colonize target organs. These findings might inform on better ways to 
target DCCs in all their forms to prevent metastasis. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 17 October 2015; accepted 3 November 2016. 
Published online 14 December 2016. 


1. Sosa, M. S., Bragado, P. & Aguirre-Ghiso, J. A. Mechanisms of disseminated 
cancer cell dormancy: an awakening field. Nat. Rev. Cancer 14, 611-622 
(2014). 

2. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 
144, 646-674 (2011). 

3. Schardt, J. A. et al. Genomic analysis of single cytokeratin-positive cells from 
bone marrow reveals early mutational events in breast cancer. Cancer Cell 8, 
227-239 (2005). 

4. Klein, C. A. et al. Comparative genomic hybridization, loss of heterozygosity, 
and DNA sequence analysis of single cells. Proc. Nat! Acad. Sci. USA 96, 
4494-4499 (1999). 

5. Schmidt-Kittler, O. et a/. From latent disseminated cells to overt metastasis: 
genetic analysis of systemic breast cancer progression. Proc. Nat! Acad. Sci. 
USA 100, 7737-7742 (2003). 

6. Rhim, A. D. et al. EMT and dissemination precede pancreatic tumor formation. 
Cell 148, 349-361 (2012). 

7. Eyles, J. et al. Tumor cells disseminate early, but immunosurveillance limits 
metastatic outgrowth, in a mouse model of melanoma. J. Clin. Invest. 120, 
2030-2039 (2010). 

8. Hisemann, Y. et al. Systemic spread is an early step in breast cancer. Cancer 
Cell 13, 58-68 (2008). 

9. Sanger, N. et al. Disseminated tumor cells in the bone marrow of patients with 
ductal carcinoma in situ. Int. J. Cancer 129, 2522-2526 (2011). 

10. Wen, H. C. et a/. p38a signaling induces anoikis and lumen formation during 
mammary morphogenesis. Sci. Signal. 4, ra34 (2011). 

11. Cardiff, R. D. Validity of mouse mammary tumour models for human breast 
cancer: comparative pathology. Microsc. Res. Tech. 52, 224-230 (2001). 

12. Lu, J. et al. 14-3-3¢ cooperates with ErbB2 to promote ductal carcinoma in situ 
progression to invasive breast cancer by inducing epithelial-mesenchymal 
transition. Cancer Cell 16, 195-207 (2009). 

13. Strippoli, R. et al. p38 maintains E-cadherin expression by modulating 
TAK1-NF-«B during epithelial-to-mesenchymal transition. J. Cell Sci. 123, 
4321-4331 (2010). 

14. Entenberg, D. et al. Setup and use of a two-laser multiphoton microscope for 
multichannel intravital fluorescence imaging. Nat. Protocols 6, 1500-1520 
(2011). 

15. Malladi, S. et a/. Metastatic latency and immune evasion through autocrine 
inhibition of WNT. Cell 165, 45-60 (2016). 


592 | NATURE | VOL 540 | 22/29 DECEMBER 2016 


16. Leung, J. Y. et al. Activation of AXIN2 expression by $-catenin-T cell factor. 

A feedback repressor pathway regulating Wnt signaling. J. Biol. Chem. 277, 
21657-21665 (2002). 

17. Grumolato, L. et al. Canonical and noncanonical Wnts use a common 
mechanism to activate completely unrelated coreceptors. Genes Dev. 24, 
2517-2530 (2010). 

18. Nieto, M. A., Huang, R. Y., Jackson, R. A. & Thiery, J. P. Emt: 2016. Ce// 166, 
21-45 (2016). 

19. Bragado, P. et al. TGF-82 dictates disseminated tumour cell fate in target organs 
through TGF-8-RIll and p38a/8 signalling. Nat. Cell Biol. 15, 1351-1361 (2013). 

20. Brisken, C. et al. Essential function of Wnt-4 in mammary gland development 
downstream of progesterone signaling. Genes Dev. 14, 650-654 (2000). 

21. Rhim, A. D. et a/. Detection of circulating pancreas epithelial cells in patients 
with pancreatic cystic lesions. Gastroenterology 146, 647-651 (2014). 

22. Ocafia, O. H. et al. Metastatic colonization requires the repression of the 
epithelial-mesenchymal transition inducer Prrx1. Cancer Cell 22, 709-724 
(2012). 

23. Brabletz, T. To differentiate or not — routes towards metastasis. Nat. Rev. Cancer 
12, 425-436 (2012). 

24. Schmidt, J. M. et a/. Ster-cell-like properties and epithelial plasticity arise as 
stable traits after transient Twist1 activation. Cell Reports 10, 131-139 (2015). 

25. Fischer, K. R. et al. Epithelial-to-mesenchymal transition is not required for lung 
metastasis but contributes to chemoresistance. Nature 527, 472-476 (2015). 

26. Hui, L. et al. p38a suppresses normal and cancer cell proliferation by 
antagonizing the JNK-c-Jun pathway. Nat. Genet. 39, 741-749 (2007). 

27. Gozdecka, M. et al. JNK suppresses tumor formation via a gene-expression 
program mediated by ATF2. Cell Reports 9, 1361-1374 (2014). 

28. Bhoumik, A. et al. Suppressor role of activating transcription factor 2 (ATF2) in 
skin cancer. Proc. Natl Acad. Sci. USA 105, 1674-1679 (2008). 

29. Fizazi, K. et al. Cancers of unknown primary site: ESMO clinical practice 
guidelines for diagnosis, treatment and follow-up. Annals Oncol. 26, v133-v138 
(2015). 

30. Narod, S. A., Iqbal, J., Giannakeas, V., Sopik, V. & Sun, P. Breast cancer mortality 
after a diagnosis of ductal carcinoma in situ. JAMA Oncol. 1, 888-896 (2015). 

31. Hosseini, H. et a/. Early dissemination seeds metastasis in breast cancer. 
Nature http://dx.doi.org/10.1038/nature20785 (2016). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank R. Parsons and P. Polulikakos for PI3K and AKT 
inhibitors, S. Aaronson and H.-C. Wen for WNT3A, SFRP1 and DKK1 reagents 
and expertise. Grant support:. HHMI (R.J.D.). SWCRF (J.A.A.-G. and E.F.F.), 
CA109182, CA196521 (J.A.A-G.), CA163131 V.AA-G and J.C.), CA100324 
(J.C), F31CA183185 (K.H.), BC132674 (.AA-G and J.C.), BC112380 (M.S.S.). 
NIH 1S10RRO24745. Microscopy CoRE at ISMMS. DFG KL 1233/10-1 and the 
ERC (322602) (C.A.K.). 


Author Contributions K.L.H. designed, performed experiments, analysed 
data and co-wrote the manuscript; M.S.S. designed experimental approach, 
performed experiments, executed intravital imaging, provided oversight, 
analysed data and co-wrote the manuscript; D.E. designed and executed 
intravital imaging, analysed data and co-wrote the manuscript; H.H. provided 
materials and analysed data; A.A.V. performed experiments; C.N. provided 
materials and histopathological analysis; J.F.C. managed mouse colonies and 
performed experiments; R.N. performed experiments and analysed data; 
N.G. maintained the Mkk3/Mkk6 wild-type and knockout mice and provided 
materials; RJ.D. provided materials and co-wrote manuscript; C.A.K. provided 
input for the writing of the manuscript; J.C. designed intra-vital experiments, 
analysed data and co-wrote the manuscript; E.F.F. provided expertise and 
analysed data; J.A.A.-G. designed and optimized experimental approach, 
provided general oversight, collected microscopy data, analysed data and co- 
wrote the manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: 
details are available in the online version of the paper. Readers are welcome to 
comment on the online version of the paper. Correspondence and requests 
for materials should be addressed to M.S.S (maria.sosa@mssm.edu) or 
JAA.-G. (julio.aguirre-ghiso@mssm.edu). 


Reviewer Information Nature thanks M. Bissell, C. Ghajar and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


METHODS 

Cells and cell culture. MCF10A cells were obtained from J. Brugge and ATCC 
(MCF10A; ATCC CRL-10317) for verification of phenotype. MCF10A cells were 
authenticated by cultures in 3D matrigel as reported previously'®. MCF10A- 
HER2 cells were transfected with HER2 plasmids obtained from L. Petty and the 
identity of the vectors was confirmed by sequencing. HER2-expressing cells were 
selected for by the addition of G418. MCF10A cells expressing SFRP1 were gene- 
rated using SFRP1 lentiviral vectors. WNT3A- and DKK1-conditioned media 
was prepared from WNT3A- expressing L-cells and DKK1-expressing 293T cells. 
Vectors and SFRP1 plasmids were a gift from S. Aaronson. Conditioned media was 
prepared from cells cultured with serum-free medium (DMEM with 1% penicillin 
and streptomycin) for 24h and then concentrated using Vivaspin 20 Centrifugal 
Concentrating tubes (Sartorius, VS2021) at 3,000g for up to 3h until the desired 
concentration (10x) was reached. 

Mammospheres and tumourspheres assays. Animal procedures were approved 
by the Institutional Animal Care and Use Committee (IACUC) of Icahn School of 
Medicine at Mount Sinai, protocol 08-0366. MMTV-Her2 mice were euthanized 
using CO; at 14-18 weeks of age or when overt tumours had formed (primary 
tumours). For mammosphere preparations, all 5 pairs of glands in mice were 
checked for the presence of any visible small lesions or palpable tumours when 
processed for early cancer cells and none were found. Even when other mam- 
mary glands were inspected microscopically in whole mounts we could not detect 
small tumours. Whole mammary glands or tumours were digested in collagenase 
and bovine serum albumin (BSA) at 37°C for 45-60 min. Red-blood-cell lysis 
buffer was used to remove blood cells from cultures and cells were then plated for 
10-15 min in DMEM containing 10% fetal bovine serum in 35-mm dishes at 37°C 
for fibroblast removal. Cells were then incubated in 2mM PBS-EDTA for 15 min 
at 37°C and passed through a 25-gauge needle. Cell suspensions were then filtered 
through a 70-\1m filter before counting. Cells were seeded in 6-well ultra-low- 
adhesion plates at a density >5.0 x 10° cells per well in 1 ml mammosphere media 
(DMEM/F 12 (Gibco 11320-082), 1:50 B27 (Invitrogen 17504-044), 20 ng ml! EGF 
(Peprotech AF-100-15-A), 1:100 penicillin and streptomycin). 

Animal experiments. Animal procedures were approved by the Institutional 
Animal Care and Use Committee (IACUC) of Icahn School of Medicine at Mount 
Sinai protocol 08-0366. Tumours were not allowed to grow beyond the IACUC 
allowed limit of 1,000 mm? per animal. Animals were randomized and assigned 
to a group when they reached a certain age, so there is no prior knowledge or 
assumption when assigning the mice to treatments. Approximately 300 spheres 
from early lesions or primary tumours were injected per site into nude mice 
(BALB/c?’™", Charles River). Suspension cultures were spun at 300 r.p.m. for 
4 min and then suspended in 150,11 PBS Ca”* and Mg” per 300 spheres. Matrigel 
(Corning 356231) was then added ina 1:1 ratio. Spheres were injected into the two 
fourth inguinal gland fat pad using a 27-gauge needle. In the case of mice injected 
with tumour-derived spheres, mice were euthanized when the tumour reached 
1,000 mm? according to IAUCU regulations. Incidence measured at 1, 3 and 
12 months was calculated. The 12-month time point was not assessed for the tumour- 
sphere group. Micro-metastases, 3-20 cell clusters. Macro-metastases, clusters 
with more than 20 cells per cluster. 

3D mammary primary epithelial cell and MCF-10A organoid cultures. MMT V-— 
Her2 mice were euthanized using CO, at 14-18 weeks of age and mammary epi- 
thelial cells (MECs) were isolated using the same protocol used for mammosphere 
preparation. Similar to mammosphere preparations, all 5 pairs of glands in mice 
were checked for the presence of any visible or palpable tumours when processed 
for early cancer cells and none were found, even in whole mounts prepared from 
mammary glands from the same mouse processed for mammoprsphere prepara- 
tion. 5.0 x 10* MECs were seeded in 40011 assay medium (DMEM/F12, 5% horse 
serum, 1% penicillin and streptomycin, and 20 ng ml! EGF plus 2% Matrigel) in 
8-well chamber slides with 4011 of Matrigel. Organoids formed at an efficiency 
of around 30 organoids per 1.0 x 104 MECs plated. MCF10A-HER2 cells were 
cultured in three-dimensional cultures as previously described**”’. In brief, cells 
were seeded in 40011 assay medium consisting of DMEM/F12, 5% horse serum, 1% 
penicillin and streptomycin, and EGF plus 2% Matrigel on 4011 Matrigel (Corning) 
in 8-well chamber slides (Falcon 354108). 

Treatments of organoids cultures. Cultures were treated every 24h starting at 
day 6 with 51M $B203580 (Calbiochem, 559395). 6 x 10° cells were seeded for 
immunofluorescence studies and were fixed at day 12 with 4% paraformalde- 
hyde (PFA) or 10% formalin with phosphatase and protease inhibitors (NaVO3, 
Naf, pepstatin A, leupeptin and aproptinin). To measure mRNA changes in 3D 
cultures, 5.0 x 104 cells were seeded in 1 ml Matrigel in a 24-well plate and RNA 
was extracted from cultures at day 12 using 1 ml Trizol (Ambion 15596018) fol- 
lowed by RNA extraction. Cultures were treated every 24h with 5 1M DMSO or 
$B203580 and 500 ng ml“! DKK] for 48h following organoids formation and 
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fixed for immunofluorescence with 4% PFA or 10% formalin with phosphatase 
and protease inhibitors. In some cases spheres growing in suspension conditions 
were seeded in assay medium plus 2% of Matrigel directly in 4011 of Matrigel and 
3 days later the number of invasive spheres was counted. 

Differential interference contrast microscopy. 3D matrigel organoids were time- 
lapse imaged using a Zeiss LSM 880 Confocal with Airyscan microscope 10/0.45 
objective lens with 4x zoom. We recorded four positions per condition in parallel 
for 2h with 20 min interval varying with the depth of the cell. Temperature was 
maintained at 37°C and CO) at 5%. Zen 2.1 software was used to acquire and 
export images as an uncompressed AVI. Videos were made by using either Zen 2.1 
or Image] software. This was done using the services of the Microscopy Facility at 
Ichan School of Medicine at Mount Sinai. 

Immunofluorescence. 3D cultures were fixed with 4% PFA for 20 min at room 
temperature in the presence of phosphatase and protease inhibitors. Staining was 
performed as previously described for MCF10A 3D cultures™. In brief, cells were 
permeabilized using 0.1% Triton X-100 in PBS for 20 min. Blocking was done using 
1x immunofluorescence PBS wash buffer (130 mM NaCl; 7mM NazHPO,; 3.5mM 
NaH>POx,; 7.7mM NaN;; 0.1 %BSA; 0.2% Triton X-100; 0.05% Tween-20) contain- 
ing 10% normal goat serum (Gibco, PCN5000) for 1h. Primary antibodies used 
were: E-cadherin (BD Biosciences 610181); 3-catenin (BD Biosciences 610153); 
laminin V (Progen 10765); F-Actin (Life Technologies A12380); a-smooth muscle 
actin (Sigma-Aldrich C6198); active B-catenin (Millipore 05-665); CK8/18 
(ProGen GP11). The following secondary antibodies were used: AlexaFluor goat- 
anti-mouse 488, AlexaFluor goat-anti-rabbit 568. Chambers were removed from 
slides and wells were fixed and mounted with ProLong Gold Antifade reagent 
with DAPI (Invitrogen P36931). 2D cultures were fixed with 4% PFA for 20 min 
at 4°C. Cells were permeabilized in 0.1% TritonX-100 and then blocked in 3% 
normal goat serum for 30 min at room temperature. Primary antibodies were left 
1h at room temperature in 0.1% BSA in PBS followed by an additional blocking 
step. Secondary antibodies were left for 1h at room temperature in 0.1% BSA 
in PBS. Cover slips were fixed using ProLong Antifade mounting media with 
DAPI (Molecular Probes P36930). Primary antibodies used were: E-cadherin 
(BD Biosciences, 610181) and p-ATF2 (Cell Signaling, 9226). Imaging of 3D orga- 
noids was done using confocal microscopy. Images were obtained using Leica 
Software on a Leica SP5 confocal microscope. Mammary gland section imaging 
was done using a Leica DM550 fluorescence microscope using Leica Software. 
Dye separation analysis was done using Leica Software. Two-photon imaging was 
performed following the reported protocols! 

Immunohistochemistry. Tissues were fixed in 10% formalin, paraffin-embedded 
and cut into 4-6 1m sections. Following dehydration of the slides, antigen retrieval 
was done in 10 mM citrate buffer pH 6.0 (Na3H¢H507). Blocking was done using 
0.1% BSA in PBS with 10% normal goat serum for 30 min. Primary antibodies were 
left overnight at 4°C. The following primary antibodies were used to perform staining: 
E-cadherin (BD Biosciences, 610181); (3-catenin (BD biosciences, 610153); Her2 
(Abcam, ab2428). VectaStain Elite ABC Rabbit IgG (PK-6101) and Mouse IgG 
(PK-6102) kits from Vector Laboratories were used for secondary antibodies. 
Secondary antibodies were left for 1h at room temperature. The DAB substrate 
kit (Vector Laboratories, SK-4100) was used for enzymatic substrate. Mounting was 
done using Vectashield mounting media (Vector Laboratories, H-1400). Tissues 
were fixed in 10% formalin, paraffin-embedded and cut in 4-61m sections. For 
BALB-Her2-T staining, paraffin embedded BALB/c and BALB-Her2-T mammary 
gland and tumours sections were a gift from C. Klein. Following dehydration of 
the slides, antigen retrieval was done in 10 mM citrate buffer (NazHsHsO7). Triton 
X-100 was used to permeabilize cells and blocking was done using 0.1% BSA in PBS 
with 10% normal goat serum or normal donkey serum (Sigma-Aldrich D9663). 
Primary antibodies were left overnight at 4°C and secondary antibodies were 
left for 1h at room temperature. The following primary antibodies were used: 
E-cadherin (BD Biosciences, 610182); for B-catenin (BD biosciences, 610153); 
Her2 (Abcam, ab2428); p-Atf2 (Cell Signaling, 9226); p-Rb ((Ser249/Thr252), 
Santa Cruz, sc-16671); CK8/18 (Progen 412121); and p-p38 (BD Biosciences, 
612281). The following secondary antibodies were used: AlexaFluor 488 goat- 
anti-mouse, AlexaFluor 568 goat-anti-rabbit, AlexaFluor 488 donkey-anti-goat, 
AlexaFluor 547 donkey-anti-rabbit. Slides were mounted using Prolong Antifade 
mounting media with DAPI (Molecular Probes P36930). 

Western blot. Samples were collected in 1x RIPA buffer and centrifuged at 
4°C, 15,000g to clarify the lysate. Protein concentrations were calculated using 
the BioRad Protein Assay Dye Reagent (BioRad 500-0006) and a standard BSA 
curve. Samples were then boiled for 5 min at 95°C in sample buffer (0.04 M Tris- 
HCI pH 6.8, 1% SDS, 1% 8-mercaptoethanol and 10% glycerol). 6-10% SDS- 
PAGE gradient gels were run in running buffer (25mM Tris, 190mM glycine, 
0.1% SDS) and transferred to PVDF membranes in transfer buffer (25 mM Tris, 
190 mM glycine, 20% methanol). Membranes were then blocked in 5% milk in 
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TBST (Tris-buffered saline containing Tween-20) buffer. Primary antibodies 
were left overnight at 4°C. Following washing with TBST buffer HRP-conjugated 
secondary antibodies were left at room temperature for 1h. Western blot develop- 
ment was done using Amersham ECL Western Blot Detection (GE, RPN 2106) and 
GE ImageQuant LAS 4010. Primary antibodies used were: p-p38 (BD Biosciences, 
612281); p38 (BD Biosciences, 612169); p-ERK1/2 (Cell Signaling, 9101); ERK 
(BD Biosciences,610031); and Gapdh (Calbiochem, CB1001). Secondary antibodies 
used were peroxidase horse-anti-mouse IgG (Vector Laboratories, P2000); bioti- 
nylated goat-anti-rabbit IgG (Vector Laboratories, BA-1000). 

EMT qPCR array. RNA was extracted from 3D cultures using TriZol extraction as 
per the manufacturer's recommendations. cDNA was synthesized using the Qiagen 
RT2 First Strand Kit (Qiagen 330401). Expression of EMT genes was measured using 
the Qiagen human Epithelial to Mesenchymal Transition PCR array(Qiagen PAHS- 
090Z) and RT2 qPCR Master Mixes (Qiagen 330521). Plates were run in an ABI 
PRISM 7900HT sequence detection system. Results were analysed using web-based 
PCR-array data-analysis software provided by Qiagen. Heat maps were generated 
based on triplicate runs of the array using GENE-E software (Brode Institute). 
Quantitative PCR. RNA was extracted from 2D- and 3D-cell cultures using TriZol 
following the provider’s recommendation. For pre-malignant mammary gland 
tissue, RNA was extracted using Qiagen’s RNeasy Lipid Tissue Midi Kit (Qiagen 
74804). 2\ug RNA was reverse transcribed into cDNA using MMuLV Reverse 
Transcriptase (New England Biolabs, M0253L), MMuLV buffer (NEB, B0253S) and 
RNase inhibitor (Ambion, AM2682). Quantitative real time-PCR (qRT-PCR) was 
performed using a Sybr Green Powder PCR Master Mix (Bio-Rad, 170-8882) or by 
preparing a master mix using Sybr Green (Sigma-Aldrich, $9430), MgCl, (NEB, 
B9021S), dNTPs (NEB N0447L) and Taq DNA polymerase (Sigma-Aldrich D6558) 
in a Biorad thermocycler. Gapdh was used as a housekeeping control for all plates. 
qPCR primers. Human primers; GAPDH forward: 5’-GGTGAAGGTCGG 
AGTCAACGG-3’, GAPDH reverse: 5’-ATGAAGGGGTCATTGATGG 
CAACAA-3’; CDH1 forward: 5‘-ATGGGGTCTTGCTATGTTGC-3’, CDH1 
reverse: 5/-AAGGCAGAAGGATTGCTTGA-3’; TWIST1 forward: 5’-GTCC 
GCAGTCTTACGAGGAG-3’, TWIST] reverse: 5'-CCAGCTTGAGGGTCT 
GAATC-3’; SNAI1 forward: 5'-AGAGCTGACCTCCCTGTCA-3’, SNAII 
reverse: 5'-TGAAGTAGAGGAGAAGGACGAA-3’/; WNT11 forward: 
5/-CATGGAGCTCTGCTTGTGAA-3’, WNT11 reverse: 5/-GCTTCCAAGTGAA 
GGCAAAG-3'; WNT5A forward: 5/-GAAATGCGTGTTGGGTTGA-3’, WNT5A 
reverse: 5‘-AGGCATGGGTTTCCATTCT-3’; WNT5B forward: 5‘-CCAAAGGA 
TCAGAGGAGCAG-3’, WNTS5B reverse: 5‘-CTCGTTGTTTTGCAGGTTCA-3’; 
ERBB3 forward: 5'-GGCGGCACTTTTCTCTACTG-3’, ERBB3 reverse: 
5'-CGTTCCAAGTATCGCCTCAT-3’; FZD7 forward: 5‘-TGGGTTAATTTC 
CAFFTCA-3’, FZD7 reverse: 5'-GCAGTACGGGAGGAAAAACA-3/; 
AXIN2 forward: 5‘-CTGGTGCAAAGACATAGCCA-3’, AXIN2 reverse: 
5'-GTCCAGCAAAACTCTGAGGG-3’. Mouse primers; Gapdh forward: 
5'/-AACTTTGGCATTGTGGAAGGGCTC-3’, Gapdh reverse: 5’-TGGAAGA 
GTGGGAGTTGCTGTTGA-3’;Twistl forward: 5‘-AACTGGCCT 
GCAAAATCATA-3’, Twistl reverse: 5/-ACACCGGATCTATTTGCATT-3’. 
Two-photon intra-vital microscopy of mammary glands. For this work a sin- 
gle laser source tuned to 880 nm provided excitation for both the CFP tumour 
cells and the 155 kDa TRITC-dextran vascular label, in addition to producing a 
second harmonic generation signal from collagen fibres. Imaging was done with 
a 25x 1.05NA (XLPL25XWMP2, Olympus) water-immersion objective lens so 
as to bridge between low-magnification visualization of the ductal tree and high 
resolution single-cell imaging. For each mouse, a large 25-100 field mosaic was 
acquired to ascertain the ductal tree structure, from which three separate fields 
were selected for time lapse imaging , using 51m z steps to a depth of approximately 
50 1m, with each stack taken approximately every 2 min, for 4-6 h. Images were 
reconstructed and analysed either in Image]**, using the custom written Image] 
plugin, ROI_Tracker’* or with Imaris (Bitplane). Mice were anaesthetized using 
0.75-2.5% isofluorane, depilated and a skin flap surgery performed exposing 
the 2nd and 3rd mammary fat pad. The absence of a solid tumour necessitated 
the development of a custom fixturing technique wherein the exposed fat pad 
was affixed with cyanoacrylate glue to the edge of a 15-mm window fitted with 
a 12-mm diameter cover glass. The window was captured on a fixturing plate 
placed on the microscope xy stage and imaging was performed in the centre 
of the window away from the glue. Animals were maintained at physiological 
temperatures throughout imaging with an AirTherm ATX forced-air heater (WPI 
Inc.) and supplemented intravenously with 50-100 11 of PBS per hour. Intravital 
imaging was performed using a custom-built two-laser multiphoton microscope 
following previously reported imaging protocols!”. All procedures were conducted 


in accordance with the National Institutes of Health regulations and approved by 
the Albert Einstein College of Medicine animal use committee. For computational 
rendering of the videos in Supplementary Video 5 and 8, the signals within the 
segmented vessel and the manually outlined cell were separately extracted into a 
sequence of tiff images and then imported into Imaris. A colocalization algorithm 
was performed on the two signals to identify overlapping pixels. Intensity based 
surface reconstructions of the vessel (red), tumour cells (cyan) and the colocali- 
zation signal (yellow) were created and then animated. Any residual xy drift not 
eliminated by the fixturing window was removed with post-processing using the 
StackReg plugin*® for Image]. 

Experimental metastasis assays. Mammary epithelial cells derived from early 
lesions (16-week-old MMT V-Her2 females) or tumour-derived cells were intra- 
venously injected (10,000 cells per animal) into nude mice. One month later, mice 
were euthanized and the number of metastatic foci were counted by histology. 
Sphere-forming assays. Mammary epithelial cells from early lesions or from 
tumours were isolated and seeded in low attachment 24-well plates in 1 ml sphere 
medium plus 1% methylcellulose (1,000 cells per well, n= 2 animals per group, 
sextuple). The number of spheres (clusters of more than 5 cells) per group was 
counted one week later. These spheres were then dissociated with trypsin and 
re-plated and the number of spheres per group was counted one week later. 
Patient samples. Paraffin-embedded sections from tumours of patients with DCIS 
or invasive breast cancer were obtained from the Cancer Biorepository at Icahn 
School of Medicine at Mount Sinai, New York, New York. Samples were de-identified 
and obtained with Institutional Review Board approval, which indicated that 
this work does not meet the definition of human subject research according to 
the 45 CFR 46 and the Office of Human Subject Research. Immunofluorescence 
and immunohistochemistry analysis was done using samples from 10 DCIS and 
20 invasive breast-cancer patients. Invasive breast cancer samples included luminal 
A, B and HER2-positive subtypes. 

Circulating Cancer Cells and DCCs detection. 16-week-old MMT V-Her2 
mice were treated with $B203580 (10mg kg~') or DMSO for 2 weeks and blood 
(approximately 50011 per mouse) was drawn by cardiac puncture following 
IACUC protocols. Circulating cancer cells (CCCs) were purified using a negative 
lineage cell-depletion kit (130-090-858, Miltenyi Biotec), fixed and stained with 
anti-CK8/18 antibody in cytospin preparations. CCCs were counted per ml of 
blood. Bone-marrow cells from 4long bones (2 tibiae and 2 femurs per mouse) 
were flushed out with Minimum Essential Medium Eagle (MEME) (Sigma- 
Aldrich) using a 26-gauge needle and 1 ml syringe. Tumour cells were enriched 
by Ficoll-Paque plus (GE Healthcare) density gradient separation and filtered 
through a 70-jum nylon mesh to remove large aggregates. Cells were fixed with 
3% PFA for 20 min on ice and cytospin preparations were carried out by centrifuga- 
tion of bone-marrow cells at 500 r.p.m. for 3 min using poly-L-lysine-coated slides 
(Sigma-Aldrich). Bone-marrow-derived DCCs were stained with anti-CK8/18 
and anti-Her2 antibodies and cytospin preparations were analysed. We screened 
0.5-2.0 x 10° bone-marrow cells, which represents 20% of the total bone-marrow 
cells obtained from 2 tibiae and 2 femurs per mouse after Ficoll gradient separation 
and then normalized to the total volume in which each bone-marrow DCC sample 
was resuspended (approximately 1 ml). Similar to mammosphere preparations, all 
5 pairs of glands in mice were checked for the presence of any visible or palpable 
tumours when processed for early cancer cells and none were found as described 
above. 

Statistical analysis. Statistical analysis was done using Prism Software. Differences 
were considered significant if P< 0.05. For most cell culture experiments, 
one-tailed Student's t-tests were performed unless specified. For mouse experiments 
one-tailed Mann-Whitney U-tests were used. Sample sizes were chosen empiri- 
cally and no exclusion criteria were applied. The investigators were not blinded to 
allocation during experiments and outcome assessment. 

Data availability. The datasets generated during and/or analysed during the 
current study are available within the paper (and its Supplementary Information) 
and/or from the corresponding author on reasonable request. 
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Extended Data Figure 1 | Progression and staging of MMTV-Her2 
mouse models. a, Cartoons depicting the three MMTV-Her2 models used 
in this study and the different time frames for early lesions (EL) and overt 
primary tumour (PT) development. HP, hyperplasia; MIN, mammary 
intraepithelial neoplasia. b, Haematoxylin and eosin staining for sections 
of normal FVB mouse mammary tissue, and FVB MMT V-Her2 early 
lesions or primary tumours. c, Whole mounts from mammary glands of 
FVB MMTV-Her2 mice at the time early lesions were studied. LN, lymph 
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node. d, Representative images of E-cad"p-ATF2™ (left and inset) and 
E-cad°p-ATF2!° (right) ducts in the MMT V-Her2-T model. Scale bar, 10 jm. 
Arrow in left image, intact E-cadherin junction; arrow in right image, 
dismantled E-cadherin junction. e, Quantification of the percentage 

of E-cad" cells per duct that showed high or low p-ATF2 expression 

in MMTV-Her2 and MMTV-Her2-T models. *P < 0.01; one-sided, 
unpaired t-test; mean + s.e.m. (Her2, n = 30 ducts; Her2-T, n= 10 ducts). 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | p38 signalling in MMT V-Her2 models and 
DCIS patient samples. a, Early stage MMTV-Her2-T (BALB-NeuT 

15 weeks of age ) early lesion sections stained for Her2 and 6-catenin. 
Arrows, Her2*B-catMEM lo early lesion cells; arrowheads, Her2~$-catMEMhi 
cells. Scale bars, 101m. The digital dye separation module (Leica) was 
used on the images. Graph, quantification of the percentage of cells per 
duct with 8-cat™™™ for both MMTV-Her2 and MMTV-Her2-T models. 
(n=7 ducts). b, Immunohistochemistry for p-ATF2 and E-cadherin 

in MMTV-Her? early lesion tissues (age, 14-18 weeks) and primary 
tumour sections. Boxed regions are magnified in the bottom right 
panel. Note the loss of both p-ATF2 and E-cadherin in primary tumour 
samples. Scale bar, 25|1m. c, Western blot for the indicated antigens in 
lysates of mammary epithelial cells isolated from normal mammary 
glands (FVB) and tumour cells isolated from MMT V-Her2 overt 
primary tumours (Her2). GAPDH was used as a loading control. For 

gel source data, see Supplementary Fig. 1. d, Immunohistochemistry 

for p-p38 in normal epithelium (BALB/c), early lesion tissues (BALB- 
NeuT early lesions, 7 weeks) and overt primary tumours (BALB-NeuT 
primary tumours). Graph, percentage of p-p38 positive cells in each stage. 
n= 11-15 ducts, 5 tumours. Scale bars, 201m (inset) and 501m; 

**P < 0.01; ***P < 0.0001; one-tailed Mann-Whitney U-test. 
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e, Representative images of parallel sections from DCIS patient samples 
stained for p-ATF2 (red), Her2 (green in insets lower row), or E-cadherin 
(green in large panels and insets upper row). Samples were Her2-positive 
(n=5) or -negative (n = 5) by immunofluorescence microscopy analysis 
for Her2 (insets top and bottom row left, green). Inset right column, detail 
of E-cadherin junctions in Her2* and Her2~ samples. Arrowhead, strong 
E-cadherin junctions; arrow, weak E-cadherin staining. Scale bars, 25 ym 
and 101m (inset). f, Metamorph software was used to quantify Her2, 
E-cadherin and p-ATF2 fluorescence signal intensity in 10 DCIS samples 
shown in panel e. Mean fluorescence intensity (m.f.i.) + s.e.m. per cell 
per field from Her2* (black bars, n = 5) compared to Her2™ (grey bars, 
n=5) samples from patients with DCIS. ***P < 0.05; two-way ANOVA. 
g, Immunohistochemistry for p-p38a and p-ATF2 performed on invasive 
breast cancer (IBC) tumours from patients (n = 20). Samples were 
classified as Her2* (n = 10) or Her2~ (n= 10) by the pathology service. 
Note the significant reduction in both p-p38a and p-ATF2 in Her2*+ 
tumours. Insets show additional patient samples for each group. Graph, 
metamorph was used to determine the mean signal intensity + s.e.m. per 
field for p-p38 and p-ATE2. p-p38 intensity, left axis and first two columns 
of the graph. p-ATF2 intensity, right axis and last two columns of the 
graph. **P < 0.05; unpaired t-test. Scale bars, 251m. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Characterization of invasive and signalling 
properties of Her2* early lesions. a, MCF10A-HER2 organoids stained 
for E-cadherin (green) and DAPI (blue). Left, a representative organoid. 
Scale bar, 25m. Right, details of invading E-cad" cells (top and bottom). 
Arrowheads, outward invading E-cad! cells. Scale bar, 10 um. Approximately 
35-40% of MCF10A-HER2 organoids show outward invasion of one cell 
per organoid in equatorial sections; 92 + 8.3% of those invading cells are 
E-cad!°. b, MCF10A-HER2 organoids were stained for F-actin (red) and 
DAPI (blue). Note the extensions of F-actin from invading cells (boxed 
areas and right top and bottom insets) still in contact with the organoid. 
Scale bars, 20j1m. c, Detection of E-cadherin and p-ATF2 in MCF10A- 
HER2 cells treated with or without lapatinib (100 nM), AG1478 (51M) 
and siRNA targeting HER2 (40 nM) for 24 h; E-cadherin (green), pATF2 
(red). Graph, fold change of the percentage of p-ATF2* cells. Data are 
mean +s.e.m.; **P < 0.01; ***P < 0.001; one-sided, unpaired t-test; n =3 
experimental replicates, 10 images per treatment. d, MCF10A-Her2 cells 
were treated for 24h with AG1478 (11M) left, or with the AKT inhibitor 
MK2266 (541M), pan-PI3K inhibitor GDC-0941 (11M) or lapatinib 
(11M), right. Western blots for p-AKT and total AKT (T-AKT) (left) 

or p-S6 and 3-tubulin (right). Gel source data in Supplementary Fig. 1. 
Graph, control for Her2 knockdown in MCF10A-HER2 cells; one-sided, 
unpaired t-test; median and range are shown. e, MMTV-Her2 early 
lesion organoids were treated with GDC-0941 (11M), lapatinib (1 1M) or 
MK2266 (5\1M) for 24 h. Organoids were fixed and stained for p-ATF2. 
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Graph, percentage of p-ATF2* cells per organoid. Scale bars, 251m. 
Median + s.e.m.; one-sided, unpaired t-test. f, Left, quantification of the 
percentage of nuclear p-ATF2+ MCF10A-HER2 cells treated for 24 h 
with vehicle (CRTL), 51M SB203580 (SB), 100 nM lapatinib (LAP) or the 
combination of the two drugs. Right, representative immunofluorescence 
images of the p-ATF2 signal (red); DAPI (blue) was used to count total cell 
numbers. Insets and arrows show a detail of nuclear p-ATF2 levels in the 
respective groups. One-sided Mann-Whitney U-test at 95% confidence; 
median and range are shown, n= 2 independent wells per condition; 

n> 150 cells scored per condition. Scale bars, 25 ,1m. g, MCF10A-HER2 
organoids treated for 6 days with $B203580 or DMSO and stained for 
F-actin (red). Bottom left graph, percentage + s.e.m. of MCF10A-HER2 
organoids with outward invasion (DMSO n= 109; $B203580 n= 87) 
*** P< 0.001; one-sided, unpaired f-test. Scale bars, 10}1m. Bottom 

right graph, percentage of invasive MMTV-Her2 organoids (DMSO 

n= 11; SB203580 n=9). P=0.01; one-sided, unpaired t-test; mean + s.d. 
Representative of 3 biological replicates. h, MMT V-Her2 early lesion 
sections (age, 14-18 weeks) stained for CK8/18 (red), Her2 (green) and 
nuclei (DAPI, blue). Top image, a duct is outlined. The boxed region and 
bottom images showCK8/18* and Her2* singlets or doublets within the 
stroma near ducts. Graph, percentage of stroma-invading cells that were 
either double positive for both CK8/18 and Her2 or single CK8/18*. 
Mean +s.e.m.; **P < 0.01; one-sided Mann-Whitney U-test; n = 4 mice, 
60-80 cells per mouse. Scale bars, 10,1m. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Quantification of early dissemination by 
Her2* early cancer cells. a, MMTV—Her2-NDL5-CFP early lesion 
mammary gland tissues (seven-week-old females) co-stained for CFP 
(green) and Her2 (red). Arrows, co-distribution of Her2 and CFP. Graph, 
percentage of positive cells for the single Her2 staining (white bars) or 
double co-staining for Her2 and CFP (black bars) per field. Approximately 
88% of early lesion cells are positive for Her2 and CFP. Scale bars, 10m. 
b, Early circulating cancer cells (eCCCs) were detected in cytospin 
preparations by staining for CK8/18 (green) and nuclei with DAPI (blue) 
after a Ficoll gradient and negative selection (see Methods). Scale bar, 10,1m. 
c, Detection of eDCCs in lung sections from MMT V-Her2 mice by 
immunohistochemistry for Her2 (rabbit anti-Her2 antibody (Abcam, 
ab2428)). Scale bar, 251m. Right, augmented images from additional 
sections. Red arrowheads, Her2-positive DTCs; red asterix, host Her2- 
negative cells. Scale bars, 101m. Staining controls are shown in e. 

d, eDCCs in the bone marrow of MMTV-Her2 mice detected in cytospin 
preparations of whole bone-marrow samples after a Ficoll gradient and 
staining for CK8/18 (green), Her2 (red) and DAPI (blue). CK8/18*, Her2* 
or double-positive cells were considered eDCCs. Right, individual channel 
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signals. Left, merged channels on the right detecting a bone-marrow 
CK8/18*tHer2* DCC (arrow) next to a CK8/18" Her2~ bone-marrow cell 
(asterix). Scale bars, 10j1m. e, Top, Immunohistochemistry for Her2 in 
non-transgenic FVB lung sections. Her2* cells were undetectable in FVB 
lung sections. Bottom, IgG isotype for the Her2 antibody used in c in lungs 
of MMTV-Her2 mice. Scale bars, 50\1m. f, g, Top, IgG control images for 
eDCC detection in MMTV-Her2 lung sections. Bottom, example of Her2t 
(red) staining using the Calbiochem (OP15L) (f) and Abcam (ab2428) (g) 
anti-Her2 antibodies. M, mouse; R, rabbit. Scale bars, 101m. h, eCCCs 
detected by CK8/18* as in b in blood of MMTV-Her2 mice (age, 14-18 w) 
treated for 2 weeks with DMSO (C) or the p38a/@ inhibitor $B203580 
(DMSO n=4; SB n=5 mice). i, eDCCs detected by CK8/18* as in d in 
bone marrow of MMTV-Her2 mice treated as in h (1 =5 mice per group). 
j, DCCs detected in the lung of MMTV-Her2 mice carrying only early 
lesions as in c and treated as in h. Graph, percentage of Her2+ eDCCs per 
field in each group (n= 30 fields, 3 mice per treatment). For h-j, median 
and individual fields (j) or mice (h, i); *P < 0.05; ***P < 0.001; one-sided 
Mann-Whitney U-test. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | EMT markers in Her2* early lesion cells. 

a, MCF10A-HER2 organoids treated for 6 days with SB203580 or control 
(DMSO) were stained for E-cadherin (green) and 3-catenin (red) or DAPI 
(blue); MCF10A-HER2, n= 20 organoids per treatment. Scale bars, 101m 
(top), 201m (bottom). b, MCF10A-HER2 organoids treated with siRNAs 
targeting p38a or a non-targeting control (siCTL); n = 20 organoids per 
treatment. Scale bars, 10}1m (top), 201m (bottom). c, Quantification 

ofa and b. *P < 0.048; NS, not significant; one-sided, unpaired f-test. 

d, Graph, percentage + s.e.m. B-cateninMEM in MCF10A-HER2 with/ 
without SB203580 and with/without p38a siRNA. *P = 0.0047; one-sided, 
unpaired t-test. e, MMTV-Her2 organoids treated with p38a or control 
siRNA (48 h) and stained to detect active 3-catenin (see Methods). 

Graph, percentage + s.e.m. of organoids stained for active B-catenin 

(n= 10 organoids per treatment). Scale bars, 25 zm. P= 0.0059; one-sided, 
unpaired t-test. f, AXIN2 mRNA expression in MCF10A-HER2 cultures 
treated for 24 h with DMSO control (C) or SB203580 (541M). Technical 
triplicate determinations were normalized to GAPDH and fold change 
(FC) over control was determined for five biological replicates. P < 0.05; 
one-sided, unpaired t-test, mean + s.e.m. g, MRNA levels for SNAII and 
TWIST1 normalized to GAPDH in MCF10A-HER2 3D cultures treated 
with siRNA targeting the p38a isoform or ATF2 from day 6-12. Graph, 
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fold change over control in three biological replicates. **P < 0.01; 

* P< (0001; one-sided, unpaired t-test; mean + s.e.m. h, Sections of 
MMTV-Her2 early lesions in mice treated for 2 weeks with SB203580 
(see Methods) stained for E-cadherin (top) and B-catenin (bottom). Scale 
bars, 151m. Arrows, membrane E-cadherin, or B-catenin (bottom left) 

or nuclear }-catenin (bottom right). Boxed area is shown in Fig. 3c. 

i, Isotype-matched mouse IgG control immunohistochemistry for 
8-catenin and E-cadherin in mammary intraepithelial neoplasia and a 
primary tumour. Scale bars, 25 jum. j, E-cadherin immunohistochemistry 
in C57BL/6 (WT) and Mkk3~/~ Mkk6*+!~ mice!° or FVB mice treated 
with SB203580 (see Methods). k, Quantification of j. Mean + s.e.m.; 

*P <0.01; one-sided, unpaired t-test. 1, MCF10A organoids were treated 
for 6 days with control (DMSO) or $B203580 (51M), and fixed and 
stained for E-cadherin (green). Graph, percentage of E-cad™ organoids in 
two experiments; 15 organoids per treatment per trial. Scale bars, 101m. 
Mean +s.e.m.; *P<0.01; one-sided, unpaired t-test. m, Top, immunofluorescence 
for 3-catenin (red) on mammary gland sections of DMSO- or 
S$B203580-treated FVB mice(see Methods). Scale bars, 251m. Bottom, 
immunofluorescence for «-smooth muscle actin (SMA, red), CK8/18 
(green) and DAPI (blue) on the same tissues. Scale bars, 20 jim. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Wnt signalling in Her2+ organoids and eDCC 
characterization. a, qPCR confirmation of EMT genes identified in Fig. 3g 
comparing MCF10A and MCF10A-HER2 organoids. Mean +s.e.m. 

shown as fold change over control. Values normalized to GAPDH 

from triplicate samples. *P < 0.05; one-sided, unpaired t-test. b, qPCR 
confirmation of genes identified in Fig. 3g in MCF10A-HER2 organoids 
treated with DMSO or SB203580. Mean + s.e.m. shown as fold change over 
control. Values normalized to GAPDH from triplicate samples. *P < 0.05; 
one-sided, unpaired t-test. c, qPCR for CDH1 mRNA in MCF10A-HER2 
organoids treated for 6 days with SB203580 (5 j1M) or p38a siRNA 

(20 nM). Fold change over control for biological triplicates. DMSO, 
control for $B203580 and scrambled siRNA, control for p38a siRNA. 
Mean +s.e.m.; *P < 0.05; one-sided, unpaired t-test. d, Western 

blot for haemagglutinin (HA)-tagged SFRP1 constructs in MCF10A- 
HER2-SFRP1 cell lines. Gel source data, see Supplementary Fig. 1. 

e, Axin2 MRNA levels in MCF10A-HER2 and MCF10A-HER2-SFRP1 

cells treated with or without $B203580 (51M) for 24 h. Fold change over 
control; error bars denote s.e.m. for biological sextuplicates. *P < 0.05; 
one-sided, unpaired t-test. f, Axin2 mRNA levels measured in MCF10A 
cultures transfected with pcDNA3 (empty vector) or CA-p38a (D176A 
and F372S mutant) plasmids and then treated with or without WNT3A 

for 24 h. Fold change over control is shown; error bars denote s.e.m. for 
biological triplicates. *P < 0.02; one-sided, unpaired t-test. g, Percentage of 
outward-invading cells from MCF10A-HER2 and MCF10A-HER2-SFRP1 
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organoids treated for 6 days with DMSO or SB203580 (51M). n= 20 
organoids per treatment, biological duplicates; data are shown as 

mean + s.e.m.; *P < 0.05; one-sided, unpaired t-test. h, Left, E-cadherin 
(green) in MCF10A-HER2 and MCF10A-HER2-SFRP1 organoids treated 
for 6 days with SB203580 (541M). Right, 8-catenin (red) in organoids 
treated as on the left. Insets (h1-h4) show magnified boxed regions. 
Graph, percentage of E-cadhi (green bars, left axis) and B-cateninME™M 
(red bars, right axis) organoids. Error bars denote s.e.m.; NS, not 
significant; *P < 0.003; **P < 0.02; one-sided, unpaired t-test; n = 20 
organoids per treatment, biological duplicates. Scale bars, 10,1m. 

i, Quantification of early-lesion or primary-tumour cancer cells with the 
indicated profiles; 4 animals per group. **P < 0.01; one-sided Mann- 
Whitney U-test; mean + s.e.m. Bottom, immunofluorescence for Twist1"! 
(T*) protein in HER2* (H*) cancer cells in early lesions (n = 883 cells) 
or primary tumours (n > 3,000 cells). j, Immunofluorescence for p-H3 
(green) and Her2 (red) in eDCCs from MMTV-Her2 lung sections. Scale 
bars, 10 um. k, Immunofluorescence for Her2 (red) and p-Rb (green) in 
spontaneous primary MMTV-Her2 tumours. Scale bar, 10 1m. 

1, Representative image of Her2*Twist1* lung DCCs from 33-week-old 
MMTV-Neu mice. n= 500 cells, 4 animals per group. Quantification 
shown in Fig. 4f. Scale bar, 10,1m. m, Percentage of Her2* and p-Rb* cells 
per field of view (FOV) in MMTV-Her2 mice treated as in Fig. 3c. Lungs 
sections from 3 animals. *P < 0.02; ***P < 0.0001; one-sided, unpaired 
t-test; error bars represent + s.e.m. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Metastasis-initiating potential of HER2* 
early lesion cells. a, Experimental approach for testing tumorigenic and 
metastatic potential. Early lesions cells from mouse mammary glands 
(age, 12-18 weeks) and primary tumour cells were seeded in mammosphere 
medium (see representative images). Approximately 300 mammospheres 
were injected into the fat pad of nude mice. Primary tumour formation 
and metastasis was monitored for 1, 3 and 12 months (mammospheres 
group) or for 3 months (tumourspheres group). Primary tumour and 
metastasis incidence are shown in Fig. 4g. b, Sphere-forming efficiency 
for early lesion cancer cells (age, 16 weeks) and primary tumour cancer 
cells. After one week (1) in culture, spheres were disaggregated and 
replated to test self-renewal capacity for another week (2). n=6 replicates; 
one-sided, unpaired t-test; data are mean +s.d. Representative of 3 
biological replicates. c, Left, haematoxylin and eosin staining of lung 
macro-metastasis in nude mice injected with MMTV-Her2 early lesion 
mammospheres. Scale bar, 200m. Right, immunohistochemistry for 
Her2* DCCs in mice injected with tumourspheres. Scale bar, 101m. 
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Arrows, Her2* DCCs; asterisks, Her2~ cells. d, Left, immunofluorescence 
detection of p-ERK1/2 and p-S6 in organoids produced by MMTV- 

Her2 early lesion or primary tumour cells. Right, percentage of p-S6 

or p-ERK* organoids per well. n= triplicates; one-sided, unpaired 

t-test; mean + s.e.m. e, Mammosphere from early lesion cells or 
tumourspheres from primary tumour cells were directly embedded in 3D 
Matrigel to monitor organoid behaviour for 3 days. Top, percentage of 
invasive organoids in each group. EL, early lesion; PT, primary tumour. 
Bottom, representative images used to quantify the invasive nature of early 
lesion mammospheres (left) compared to primary tumour tumourspheres 
(right). P< 0.0021; one-sided, unpaired t-test; mean +s.e.m. f, Early 
lesion and primary tumour single-cell suspensions were injected 
intravenously (tail vein) in nude mice (50,000 cells per animal). Lungs 
were collected (after 4 weeks) and processed for haematoxylin and eosin, 
and immunofluorescence for Her2 detection (right). Graph, number of 
metastatic nodules per section per animal lung (n = 3 mice in each group). 
NS, not significant; Mann-Whitney U-test; median and range are shown. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


: 


LETTER 


p<0.0001 


= 
[=] 
[=] 
o 


Tumor volume / animal (mm*) 
ai 
[=] 
Oo 


MS TS 
Groups 


Extended Data Figure 8 | Final tumour volume of early lesions or 
primary spheres after orthotopic injection into nude mouse mammary 
fat pads. Animals were randomized and approximately 300 spheres from 
early lesions or primary tumours were injected per site into nude mice 
(BALB/c™’™", Charles River) with Matrigel (Corning 356231) at a 1:1 ratio. 
Spheres were injected into the both fourth inguinal gland fat pads using a 
27-gauge needle. In the case of mice injected with tumour-derived spheres, 
mice were euthanized when tumours reached 1,000 mm’, according 

to IAUCU regulations. Tumour volumes were measured at 3 months. 
Mammospheres, n = 15 animals, tumourspheres, n = 13 animals. One- 
sided Mann-Whitney U-test with 95% confidence intervals. 
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Extended Data Figure 9 | Cartoon depicting the mechanism of early 
dissemination by Her2* early lesion cells. a, Early Her2* early lesion 
cancer cells (red) turn on Wnt, PI3K and AKT signalling, inhibit p38 
activation and E-cadherin-junction formation allowing for a Twist1™ 
EMT-like invasive program; p38 and E-cadherin inhibit the Wnt- and 
6-catenin-driven EMT-like program and invasion (grey inhibitory 
symbols). b, Her2+ p-p38!Twist1"E-cad!° early lesion cancer cells, which 
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retain CK8/18 expression can intravasate and disseminate. c, In lungs more 
than 85% of eDCCs (red) were Her2*E-cad!°(p-Rb or p-H3)", suggesting 
a large population of dormant cells. Most eDCCs are also Twist1™E-cad". 
Nevertheless, eDCCs can initiate metastasis, which correlated with the 
acquisition of a Twist1!°E-cadmet hi phenotype. In the bone marrow, 
eDCCs were Her2*CK8/18* and remain dormant for the duration of the 
experiments, as bone lesions were never observed. 
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Extended Data Table 1 | Quantification of eCCCs and eDCCs in MMTV-Her2 mice 
Disseminating MMTV -Her2 


population Median (Min-Max) 


ae (2 IE! 


eDCC BM 100% $86.54 (28-350) / BM (4 hind limb long bones) 


eCCCs, as well as lung and bone-marrow eDCCs, could be detected in 100% of mice (column two) (n=4) (for images, see Extended Data Fig. 4h-j). The median number of CK8/18* eCCC per ml of 
blood from cytospin preparations of whole blood samples is shown with range (n=4). Bone-marrow eDCCs were detected as CK8/18*, Her2* and co-stained for both markers in cytospin preparations 
from 4hindlimb long bones (2 tibias, 2 femurs) and the number of eDCCs along with minimum and maximum values in the whole bone marrow is shown (n=5). The median percentage of lung 
eDCCs per field detected by immunohistochemistry of lungs is shown with minimum and maximum number of DCCs (n=30 fields, n=3 animals). The median number of eDCCs in the bone-marrow 
compartment is equivalent to approximately 20.2 eDCCs per 10° bone-marrow host cells in control groups and approximately 46 eDCCs per 10° bone-marrow host cells in SB203580-treated animals. 
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mRNA quality control is bypassed for immediate 
export of stress-responsive transcripts 


Gesa Zander!*, Alexandra Hackmann!*, Lysann Bender!*, Daniel Becker!, Thomas Lingner?, Gabriela Salinas? & Heike Krebber! 


Cells grow well only in a narrow range of physiological conditions. 
Surviving extreme conditions requires the instantaneous expression 
of chaperones that help to overcome stressful situations. To ensure 
the preferential synthesis of these heat-shock proteins, cells inhibit 
transcription, pre-mRNA processing and nuclear export of non- 
heat-shock transcripts, while stress-specific mRNAs are exclusively 
exported and translated’. How cells manage the selective retention 
of regular transcripts and the simultaneous rapid export of heat- 
shock mRNAs is largely unknown. In Saccharomyces cerevisiae, the 
shuttling RNA adaptor proteins Np13, Gbp2, Hrb1 and Nab2 are 
loaded co-transcriptionally onto growing pre-mRNAs. For nuclear 
export, they recruit the export-receptor heterodimer Mex67-Mtr2 
(TAP-p15 in humans)”. Here we show that cellular stress induces the 
dissociation of Mex67 and its adaptor proteins from regular mRNAs 
to prevent general mRNA export. At the same time, heat-shock 
mRNAs are rapidly exported in association with Mex67, without 
the need for adapters. The immediate co-transcriptional loading 
of Mex67 onto heat-shock mRNAs involves Hsf1, a heat-shock 
transcription factor that binds to heat-shock-promoter elements 
in stress-responsive genes. An important difference between the 
export modes is that adaptor-protein-bound mRNAs undergo 
quality control, whereas stress-specific transcripts do not. In fact, 
regular mRNAs are converted into uncontrolled stress-responsive 
transcripts if expressed under the control of a heat-shock promoter, 
suggesting that whether an mRNA undergoes quality control 
is encrypted therein. Under normal conditions, Mex67 adaptor 
proteins are recruited for RNA surveillance, with only quality- 
controlled mRNAs allowed to associate with Mex67 and leave the 
nucleus. Thus, at the cost of error-free mRNA formation, heat-shock 
mRNAs are exported and translated without delay, allowing cells to 
survive extreme situations. 

To investigate how regular mRNAs are retained in the nucleus dur- 
ing stress, we analysed whether Mex67 and its adaptor proteins or the 
loading factors Yral, and Hpr1 of the THO complex, which helps couple 
mRNA transcription with processing and export into the cytoplasm’, 
dissociate from mRNAs. RNA co-immunoprecipitation (RIP) 
experiments revealed that this is indeed the case, while the splicing 
factor Prp17 remained bound** (Fig. 1a and Extended Data Fig. 1a). 
Gbp2 required shorter time-shifts because it aggregates reversibly” 
(Extended Data Fig. 1b). The dissociation of Mex67 and its adaptors is 
not complete, because dissociation occurs mainly in the nucleus. The 
cytoplasmic mRNA pool is excluded from translation as it accumulates 
with bound proteins in stress granules®. It is likely that Mex67 adaptor 
proteins dissociate to remove the export receptor from the mRNA, as 
the interaction between the two did not change upon stress (Fig. 1b 
and Extended Data Fig. Ic). 

To identify the requirements of heat-shock mRNA export, we 
conducted fluorescence in situ hybridization (FISH) experiments 
and found export defects in mex67-5 mutants, as has been observed 


previously’, and as expected in mtr2-21 mutants as both proteins act as 
a heterodimer. Notably, mutations in all Mex67 adaptor proteins, even 
when all Mex67 adaptor protein mutants were collectively mutated, 
did not affect the export of heat-shock mRNAs*’ (Extended Data 
Fig. 1d-k). Although there is a low level of expression of heat- 
shock mRNAs at 37 °C (Extended Data Fig. 1g), this temperature 
does not induce the full heat-shock response, including an MRNA 
export block, which is the case at a temperature of 42°C. For this 
reason we performed the heat stress analyses at 42°C. 

Genome-wide microarray and RNA sequencing (RNA-seq) exper- 
iments analysing bound RNAs after co-immunoprecipitation (co-IP) 
with Npl3 or Mex67 confirmed the dissociation of Npl3 and Mex67 
from mRNAs during heat stress (Extended Data Fig. 2a-f and 
Supplementary Information). Moreover, compared to Npl3, Mex67 
binds strongly to newly synthesized heat-shock transcripts, including 
those from heat-shock promoter element (HSE)-containing genes® 
(Fig. 1c, Extended Data Fig. 3a, b and Supplementary Information). 
Quantitative reverse-transcriptase-PCR (qRT-PCR) experiments 
confirmed this finding for selected transcripts (Extended Data 
Fig. 3c). 

These results suggest that bulk mRNAs are retained in the nucleus 
during heat shock through the dissociation of adaptor protein-Mex67 
complexes, whereas freely available Mex67 selectively exports heat- 
shock mRNAs. For this to happen, Mex67 could bind either directly 
to heat-shock mRNAs or to an unknown stress-specific adaptor pro- 
tein. The direct binding of Mex67 to 5S rRNA has been demonstrated 
previously’. To investigate whether Mex67 can also bind to mRNAs, 
we performed in vitro binding studies of recombinant glutathione 
S-transferase (GST)-tagged Mex67-Mtr2 or Npl|3 with purified total 
RNA and found strong mRNA binding. This means that Mex67 can 
bind directly to mRNAs and does not discriminate between regular and 
stress-specific transcripts (Fig. 2a and Extended Data Fig. 4a). 

The binding of rRNA to Mex67 requires the loop domain of the 
protein? (Extended Data Fig. 4c), which also allows Mex67-Mtr2 to 
bind mRNAs as shown by in vitro RNA-binding studies using stably 
expressed loop-domain mutants (Fig. 2b and Extended Data Fig. 4b, d). 
Notably, in vivo these mutants also affect heat-shock mRNA export 
(Fig. 2c and Extended Data Fig. 4e-g). 

To determine whether this loop domain also forms the binding plat- 
form for the adaptor proteins under normal conditions, we conducted 
co-IP experiments, assaying binding of Npl3 to the Mex67 wild-type 
protein and the loop-domain mutants. We found strong binding of 
Npl3 to wild-type Mex67, but not to the loop mutants (Fig. 2d and 
Extended Data Fig. 4h). A similar result was also obtained in vitro using 
purified recombinant proteins (Extended Data Fig. 4i, j), suggesting 
that the binding of both RNA and adaptor proteins to Mex67 requires 
the same loop domain. In fact, in vitro competition assays show that 
purified yeast RNA, but not DNA, disrupts a pre-formed Mex67-Npl3 
complex (Fig. 2e and Extended Data Fig. 4k). 


1Abteilung fiir Molekulare Genetik, Institut fr Mikrobiologie und Genetik, Géottinger Zentrum fiir Molekulare Biowissenschaften, Georg-August Universitat Gottingen, Gottingen, Germany. 
Transkriptomanalyselabor, Institut fir Entwicklungsbiochemie, Georg-August Universitat Gottingen, Gottingen, Germany. 
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Figure 2 | Mex67 directly binds and exports heat-shock mRNAs using 
the same domain as it uses to interact with the adaptor protein Npl3. 

a, Mex67 binds to coding RNAs in vitro. Recombinant GST-Mex67/His¢- 
Mtr2, GST-Npl3 or a negative control GST/Hiss—Mtr2, were incubated 
with total yeast RNA isolated from heat-stressed cells. After protein 
immunoprecipitation, the amount of bound mRNA was determined by 
qRT-PCR. Fold change is measured against binding of GST alone. 

b, Mex67 mutants with defective loop domains fail to bind mRNA in vitro. 
Purified recombinant full-length Mex67-Hisg—Mtr2, indicated mutant 
proteins and Hiss—Mtr2 (negative control) were incubated with total yeast 
RNA and re-purified via Ni-NTA-agarose. Bound transcript amounts 
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were determined by qRT-PCR. Data are normalized to the change in RNA 
binding of Hiss-Mtr2. c, FISH experiments (top) and quantifications 
(bottom) reveal export defects of the heat-shock mRNA SSA4 (green) in 
mex67 loop-domain mutants after 30 min at 42°C, >20 cells. d, The 
RNA-binding domain of Mex67 is required for interaction with Np|l3 

in vivo. Western blot analysis of Mex67 co-immunoprecipitation is 
shown. e, Increasing amounts of RNA, but not DNA, significantly 
reduce interaction of Npl3 and Mex67 in in vitro competition assays. 
***D < 0.001, **P< 0.01, *P < 0.05; two-tailed, two-sample unequal 
variance test two-tailed, two-sample unequal variance test, n > 3 for all 
panels. 
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Figure 3 | Stress induces direct recruitment of Mex67 to promoters 

of heat-shock genes. a, ChIP assays reveal an increase in the binding 

of Mex67 to heat-shock transcripts. ChIPs with Npl3 and Mex67 in 
relation to Rpb1 are shown. b, Increased interaction of Mex67 with RNA 
polymerase II during heat stress. Western blot analysis of Rpb1 co-IPs 
detects Mex67. E, eluate; L, lysate; W, wash. c, Hsfl interacts with Mex67 
at 42°C. Western blot analysis of Hsfl co-IPs detecting Mex67 are shown. 
WT, wild type; A, mft1A. **P < 0.01, *P < 0.05; two-tailed, two-sample 
equal variance test, n > 3 (a). 


During heat stress, Mex67 binds to heat-shock transcripts that are 
expressed by genes that contain HSEs in their promoters (Fig. 1c). 
These elements are bound by the highly conserved heat-stress-specific 
transcriptional activator Hsfl. Hsfl can bypass the requirements 
for a number of critical general transcription factors, including the 
C-terminal domain of RNA polymerase II (ref. 10). Instead, Hsf1 
itself recruits important factors such as the Mediator transcriptional 
co-activator complex’!. Chromatin immunoprecipitation (ChIP) 
showed that Npl3, but not Mex67, binds to HEM15 at 25°C, whereas 
Mex67, but not Npl3, is recruited to HSP12 at 42°C (Fig. 3a). Consistent 
with this, Mex67 does not interact with the largest RNA polymerase II 
subunit (Rpb1) under regular conditions, but does during heat stress 
(Fig. 3b and Extended Data Fig. 41). It may be that Mex67 is loaded 
onto the transcription machinery via Hsf1, as they interact during 
heat stress (Fig. 3c). Although the THO complex was shown to be 
required for RNA export under heat stress”, it does not influence the 
interaction between Hsfl and Mexé67, as is seen in the THO complex 
mutant mft1A. 

The direct loading of Mex67 onto heat-shock mRNAs, without the 
need for adaptor proteins, raises two questions. It is unclear why such 
a direct export mode was not established for all mRNAs, and why cells 
retained adaptor proteins in evolution if it is possible to export mRNA 
without them. In fact, the shuttling mRNA adaptor proteins have been 
suggested to form part of a nuclear mRNA surveillance system that 
prevents the nuclear export of incorrect, possibly improperly processed 
or assembled, ribonucleoproteins (RNPs). The principle components 
of this surveillance system include the TRAMP complex (Trf4/Trf5, 
Air1/Air2, Mtr4), which tags incorrect transcripts with short poly(A) 
tails, marking them for subsequent degradation by the Rrp6-containing 
exosome!’, Additionally, MIp1 interacts with adaptor proteins and acts 
as a final gatekeeper at the nuclear pore complex (NPC) by retaining 
immature transcripts in the nucleus!*")’. 

Adaptor proteins are loaded onto mRNAs at different processing 
steps: Npl3 associates close to the 5’ cap??!, Gbp2 and Hrb1 asso- 
ciate during splicing!””°”? and Nab2 associates preferentially at the 
3’ end*®”3, Therefore, the adaptor proteins might control individual 
maturation steps, marking mRNAs as export-competent via binding 
of Mex67 to allow subsequent Mlp1-controlled nuclear export!’. To 
visualize their function in quality control, we utilized a FISH assay that 
monitors bulk mRNAs'”. While mRNAs are evenly distributed in wild- 
type cells or in cells with GBP2, HRB1 or NPL3 deletions, the rrp6A 
mutant accumulates RNAs in the nucleus that are usually degraded 
(Fig. 4a and Extended Data Fig. 5a-d). As has been demonstrated, the 
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Figure 4 | Stress-specific mRNAs are not quality controlled. a, FISH 
experiments with a Cy3-labelled oligo d(T)s50 probe (red) show leakage 

at 30°C for 3h of mRNAs, that would ordinarily be degraded, into the 
cytoplasm in the absence of Mex67 adaptor proteins. b, Promoter- 
dependent accumulation of transcripts in quality-control-factor mutants. 
Strains carrying plasmids with indicated gene constructs were analysed in 
FISH experiments at 42°C for 30 min (top) and quantified (bottom) with 
oligonucleotides targeting GFP mRNA (red) and poly(A)* RNA (green) 
(n> 3, >20 cells). c, Incorrect transcripts are not retained in the nucleus 
upon heat stress. Northern blot analysis of cytoplasmic fractionation 
experiments detecting intron-containing RPL23B transcripts, expressed 
from the indicated promoters. d, mRNA quality-control factors are 
dispensable for the heat-shock response. Serial dilutions of severely heat- 
stressed (right; 52°C for 20 min) strains are shown on full medium plates. 


simultaneous deletion of GBP2 and HRB] alleviates this phenotype’” 

as do the double mutants p/3A rrp6A and nab2A rrp6A, allowing 
poly(A)* RNA leakage to the cytoplasm. Expression of mex67-5 in 
any adaptor proteinA rrp6A double mutant interrupted leakage, indi- 
cating that the irregular nuclear escape of false transcripts depends 
on functional Mex67 (Extended Data Fig. 5e, f). Whereas deletion of 
the adaptor proteins caused enhanced cytoplasmic mRNA leakage, 
the opposite was observed when intact or mutated quality control 
factors were overexpressed, which is toxic and leads to nuclear mRNA 
retention!>*4 (Extended Data Fig. 5g-j). These results uncover an 
important function of Mex67 adaptor proteins in nuclear mRNA quality 
control. It is likely that their role is to monitor mRNA processing and 
recruit the degradation machinery instead of Mex67, in case processing 
events take too long, as has previously been shown for Gbp2 and 
Hrb1(ref.17). 

As Mex67 can export heat-shock transcripts without the need 
for adaptor proteins, it is conceivable that these transcripts undergo 
no quality control. This is of particular relevance, because the RNA 
degradation machinery is functional during heat stress and degrades 
heat-shock RNAs under certain conditions, for example, when these 
transcripts are artificially trapped in the nucleus”*”®. In this situation, 
it is possible that the transcripts are not being properly covered by 
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proteins and are therefore not protected against degradation. Thus, 
by preventing adaptor proteins from binding to heat-shock mRNAs, 
their potential quality-control-mediated degradation would be circum- 
vented. Our in vivo leakage assay clearly showed that, in contrast to 
regular mRNAs, heat-shock mRNAs do not accumulate in the nuclei 
of rrp6 or mtr4 mutants at 42 °C (Extended Data Fig. 6a—c), suggest- 
ing that stress-specific transcripts do not undergo adaptor-protein- 
mediated quality control. During stress, the quality control factors Nab2 
and Mlp1 are eliminated from the NPCs and accumulate in nuclear 
foci?, which probably supports uncontrolled mRNA export. 

It is possible that omitting quality control of heat-shock transcripts 
might speed up heat-shock protein expression, increasing the cell’s 
chance of survival. However, this raises the question of how the cell 
knows to send heat-shock transcripts to ‘fast-track export’. This is likely 
to be determined by the gene promoter, because, as has been shown’, 
HSEs are required for translation during stress and Hsf1 recruits Mex67 
(Fig. 3). Promoter-swap experiments show that GPM1 or CYC1 tran- 
scripts, expressed under their own promoter, strongly accumulate 
in mtr4 and rrp6 mutants. This is not the case when these genes are 
expressed under the control of the HSP12 promoter or their own pro- 
moters that have been modified to contain artificial HSEs, suggesting 
that these RNAs are not quality controlled (Fig. 4b and Extended Data 
Fig. 7a—d). Vice versa, the HSP12 transcript undergoes quality control 
when it is expressed under the control of the GPM1 promoter, sug- 
gesting that the fate of undergoing quality control is encrypted in the 
promoter. 

To demonstrate that heat-stress-responsive transcripts bypass 
quality control, we expressed the intron-containing housekeeping 
mRNA RPL23B under the control of the HSP12 promoter and verified 
the export of unprocessed transcripts to the cytoplasm (Fig. 4c and 
Extended Data Fig. 7e). The skipping of quality control during heat 
stress on a cellular level is reflected in a recovery assay, in which the rate 
of cell survival is an indication of participation in the stress response. 
Whereas disturbances in mRNA export by, for example, mutations in 
MEX67 or the nucleoporin RIP1/NUP42 drastically affect heat-shock 
survival rate, the absence of quality control factors does not (Fig. 4d). 

Together, our data indicate that stress precludes adaptor-protein- 
mediated quality control and export of bulk mRNA. Instead, it induces 
the direct Mex67-mediated export of heat-shock mRNA without quality 
control (Extended Data Fig. 7f). Thus, at the cost of accuracy, cells use 
a simplified mRNA export route for heat-shock transcripts, guaran- 
teeing their rapid translation and enabling cell survival under extreme 
conditions. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Yeast strains, plasmids and oligonucleotides. All yeast strains used in this study 
are listed in Extended Data Table 1, oligonucleotides in Extended Data Table 2 and 
plasmids in Extended Data Table 3. Plasmids and yeast strains were generated by 
conventional methods. 

Co-immunoprecipitation experiments. The experiments were essentially per- 
formed as published previously’. All yeast strains were grown to log phase (2-3 x 10” 
cells/ml). For the experiment shown in Fig. 1b, the cells were split into three por- 
tions: one served as an unstressed control and the cells of the two other portions 
were either incubated for 15 min at 42°C to apply heat stress or treated with 1M 
NaCl for 1h at 25°C for salt stress. Cells for the precipitation shown in Fig. 2d 
were not shifted. For co-IP analyses shown in Fig. 3b, c, cells were split into two 
portions: one served as an unstressed control (25°C) and the other was incubated 
for 15 min at 42°C. Afterwards, the cells were collected immediately and lysed in 
immunoprecipitation buffer (1x PBS, 3mM KCl, 2.5mM MgCh, 0.5% Triton 
X-100, 200 j1g/ml RNase A, vanadyl phosphatase inhibitors and protease inhibitors 
from Roche). The supernatant was incubated for 3-4h at 4°C (Figs 1b, 2d) or for 
20 min, at either 25°C (for the unstressed cell lysate) or 37°C (for the heat-stressed 
lysate) (Fig. 3b, c) with protein G sepharose beads (Amersham Biosciences) bound 
to Myc (9E10)-antibody (Santa Cruz), HA-antibody (Santa Cruz) or Mex67 anti- 
serum (own) or with GFP-Trap_A beads (Chromotek). The matrix was washed 
six times with immunoprecipitation buffer and proteins were detected by western 
blot analyses with the indicated antibodies (GFP (Pierce) 1:5,000; Myc (9E10) 
(Santa Cruz) 1:1,000; Tdh1 (Pierce) 1:5,000; Hem15 (R. Lill) 1:7,000; Mex67 (C. 
Dargemont) 1:2,000; Mex67 (rabbit, serum) 1:50,000; HA (Santa Cruz) 1:1,000), 
Rps3 (rabbit, own serum) 1:750). Signals were detected with the Fusion SL system 
(PeqLab). Intensities were quantified using the Bio1D software. 

RIP experiments. The experiments were essentially carried out as described pre- 
viously’. All yeast strains were grown to mid-log phase (2-3 x 10’ cells/ml). For 
the experiment shown in Fig. 1a and Extended Data Fig. Ic, cells were split into 
three portions: one cell portion served as an unstressed control, one portion was 
incubated for 15 min (7 min for Gbp2) at 42°C to apply heat stress and the final 
portion was treated with 1 M NaCl for 1h at 25°C for salt stress. Afterwards the 
cells were immediately collected and lysed in RIP buffer (25mM Tris HCl pH 7.5, 
100mM KCl, 0.2% (v/v) Triton X-100, 0.2mM PMSE, 5mM DTT, 10U RiboLock 
RNase Inhibitor (Thermo Scientific) and protease inhibitor (Roche)). The super- 
natant was incubated for 3-4h at 4°C with protein G sepharose beads (Amersham 
Biosciences) bound to Myc (9E10)-antibody (Santa Cruz) or with GFP-Trap_A 
beads (Chromotek). The matrix was washed six times with RIP buffer and split 
into two portions after the last washing step. Proteins were detected by western 
blot. RNA was extracted using phenol/chloroform and further purified using an 
RNA purification kit (Macherey & Nagel). All of the purified RNA was used for 
subsequent dot-blot experiments. 

For Fig. 1c and Extended Data Figs 2, 3 (microarray and RNA-seq) yeast cells 
were grown at 25°C in yeast extract peptone dextrose (YPD) medium to log phase. 
The cells were collected and resuspended in 50 ml YPD medium and incubated at 
either 25°C or 42°C in a Petri dish for 30 min. Subsequently, cells were irradiated 
with UV (254nm, 120,000 MJ/cm7) to induce cross-linking. After centrifugation, 
cell pellets were washed in 1 ml RIP buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 
0.2% (v/v) Triton X-100, 2mM DTT, 10 U RiboLock RNase Inhibitor (Thermo 
Scientific)). One volume of the pellet was mixed with 1.5 volumes of RIP buffer, 
containing EDTA-free protease inhibitor cocktail (Roche) and 1 volume of glass 
beads. Cells were lysed by vigorous mixing for 30s at 4m/s using the FastPrep-24 
machine (MP Biomedicals). Co-IP experiments were performed at 4°C for 4h by 
incubating the lysates with protein G sepharose beads bound to monoclonal Myc 
(9E10) or GFP-Trap_A beads. Afterwards beads were washed five times with RIP 
buffer and treated with 20\.g proteinase K at 55°C for 20 min to remove RNA- 
bound proteins. Subsequently, eluates were purified via trizol-chloroform (Ambion 
RNA, Life Technologies) extraction. Contaminating DNA was removed with the 
TURBO DNA-free kit (Ambion RNA by Life technologies). The purified RNA 
was reverse transcribed with Maxima reverse transcriptase (Thermo Scientific) 
for subsequent qRT-PCR analyses. 

Dot-blot analyses. For dot blot experiments (Fig. la and Extended Data Fig. 1a, c) 
RNA was spotted onto a Hybond N* nylon membrane (GE Healthcare) and UV 
cross-linked (254 nm, 120,000 :J/cm?). The membrane was incubated at 80°C for 
2h, pre-hybridized for 1h in hybridization buffer (0.5 M sodium phosphate buffer, 
pH 7.5, 7% SDS, 1mM EDTA) and hybridized overnight at 42°C in hybridiza- 
tion buffer with a digoxygenin (DIG)-labelled oligo d(T)59 probe. The membrane 
was washed once with 2x SSC buffer, 0.1% (w/v) SDS; once with 1 x SSC buffer, 
0.1% (w/v) SDS for 15 min at room temperature; and twice with 0.5 SSC buffer, 
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0.1% (w/v) SDS for 15 min at 42°C. For detection of the signal, the manufactur- 
er’s instructions (Roche) were followed. Signal intensities were quantified using 
the Fusion camera (Peqlab) and Fiji software, and compared to the signal of the 
unstressed samples, always in relation to the precipitated protein. 

FISH experiments. The experiments were essentially carried out as described 
previously!”. RNA probes were synthesized by in vitro transcription, using the 
T7 RNA polymerase (Thermo Scientific) and labelled with DIG-UTP using an 
RNA labelling mix (Roche) or with Cy3-labelled oligonucleotides (Sigma), which 
are listed in Extended Data Table 2. To detect poly(A)+ RNA a Cy3- or Atto488- 
labelled oligo d(T)59 probe (Sigma) was used. Cells were grown to log phase before 
being subjected to heat stress at 42 °C for 30 min or Lh. For leakage analyses, cells 
were shifted to 30°C (Fig. 4a and Extended Data Fig. 5a, f, h) or to 37 °C (Extended 
Data Fig. 5c, j) for 3h. For visualization of mRNAs under control of the GPM1 
promoter, cells were grown in 1% glucose (in the respective medium). Glucose con- 
centration was adjusted to 4% at 15 min before heat stress. Samples were fixed by 
adding formaldehyde, giving a final concentration of 4%. Cells were spheroplasted 
by adding zymoylase, permeabilized in 0.1 M potassium phosphate buffer pH 6.5 
with 1.2 M sorbitol and 0.5% Triton X-100, pre-hybridized with Hybmix (50% 
deionized formamide, 5x SSC buffer, 1x Denhardts, 500 j1g/ml tRNA, 500 jig/ml 
salmon sperm DNA, 50 jpg/ml heparin, 2.5 mM EDTA pH 8.0, 0.1% Tween 20, 
10% dextran sulfate) for 1h on a polylysine coated slide at 37°C and hybridized in 
Hybmix with the specific probe overnight at 37 °C. After hybridization, cells were 
washed with 2 SSC buffer and 1x SSC buffer at room temperature, each for 1h 
and with 0.5x SSC buffer at 37°C and room temperature, each for 30 min. For 
detection of DIG probes, the cells were treated with blocking buffer containing 5% 
heat-inactivated FBS for 1 h and incubated with sheep anti-digoxigenin Fab-FITC 
antibody (Roche) overnight at 4°C or at room temperature for 4h. DNA was 
stained with Hoechst 33342 (Sigma). Microscopy studies were performed with a 
Leica AF6000 microscope and pictures were obtained using the LEICA DFC360FX 
camera and the LAS AF 2.7.3.9 software (Leica) and quantified using Fiji software. 
Microarray and RNA-seq analyses. A detailed description is given in the 
Supplementary Information. 

In vitro binding studies. To purify yeast RNAs bound to recombinant Mex67 and 
Mtr2, Escherichia coli BL21 DE3 cells carrying pET8c-Hiss-MTR2 (pHK1279) and 
either pGEX4T-1-GST-MEX67 (pHK442) or pGEX4T-1-GST-NPL3 (pHK1276), 
or, as a negative control, pGEX4T-1-GST (pHK439). pET8c-His6-MTR2 
(pHK1279) cells were grown for two days at 16°C with 2 mM isopropyl-6-p- 
thiogalactopyranosid (IPTG). Cells were collected and sonicated in binding 
buffer (5% (v/v) glycerin, 100 mM NaCl, 2mM MgCl, 20mM HEPES, 0.14% (v/v) 
2-8-mercaptoethanol and protease inhibitor from Roche). Protein complexes were 
purified via affinity purification with Gluthatione Sepharose 4B (GE Healthcare). 
The sepharose (2011 of 50% slurry) was incubated with bacterial lysate containing 
the respective recombinant proteins and purified total yeast RNA from stressed 
and unstressed cells (50,1g/sample) for 1h at 4°C. The sepharose was washed six 
times with binding buffer and split into two portions, one of which contained 
10% of the beads to control the pull-down efficiency of the western blot analysis. 
The other fraction was treated with proteinase K at 37°C for 30 min. RNA was 
extracted with phenol/chloroform and further purified with an RNA purification 
kit (Macherey & Nagel). CDNA was prepared with Maxima Reverse Transcriptase 
(Thermo Scientific) with random hexamer primers according to the manufacturer's 
instructions. Abundance of mRNA was determined by qRT-PCR using GoTaq 
2x master mix (Promega) and primers specific for each transcript (Extended Data 
Table 2). The mRNA levels were compared to the negative control (Fig. 2a). 

To investigate whether the mutations in the loop domain of Mex67 
(ref. 9) would affect mRNA binding (Fig. 2b), recombinant Hiss-tagged 
Mtr2 and untagged Mex67 pHK1372 (HIS: TEV:MTR2:MEX67), pHK1373 
(HIS: TEV:MTR2:MEX67loopKR > AA) and pHK1374 (HIS: TEV:MTR2:MEX67- 
409-435aaK343E) (constructs described in ref. 9) were co-expressed and purified 
from E. coli Rosetta 2 cells by affinity chromatography using Ni-NTA agarose 
(Macherey & Nagel). As the negative control, pET8c-Hisg-MTR2 (pHK1279) was 
expressed and purified as described above. The purified proteins were diluted 
in washing buffer (30 mM HEPES pH 7.5, 100mM NaCl, 10% (v/v) glycerin, 
and protease inhibitors (Roche) to a final concentration of 0.45 mg/ml. Samples 
were incubated with purified total yeast RNA (100,1g per sample) extracted from 
heat-stressed cells (30 min at 42°C) and Ni-NTA agarose (Qiagen) for 60 min at 
4°C. The matrix was treated and divided as described above. Pull down of pro- 
teins was analysed using western blot, RNA was analysed as described above. The 
mRNA levels were compared to the negative control and set into relation with 
wild type. 

Mex67 in vitro co-IP and RNA/DNA competition assay. GST-Npl3 was recom- 
binantly expressed in E. coli Rosetta 2 cells carrying a pGEX4T-1-GST-NPL3 
(pHK1276) plasmid. Cells were collected and bacterial lysate with app. 1 ug/l 
GST-Npl3 in lysis buffer (20 mM HEPES pH 7.5, 100 mM NaCl, 4mM MgCl, 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


10% (v/v) glycerin, 1mM DTT, 0.1% NP-40 and protease inhibitors (Roche) 
was prepared. Purification of recombinant Mex67, mex67 Aloop409-435 and 
mex67-loopKR > AA (constructs described in ref. 9) proteins in complex with 
Mtr2 has already been described in the section on in vitro binding studies. For 
the co-IP experiments 501g of the respective Mex67 protein, 25011 GST-Npl3 
bacterial lysate, 100 1g RNase A, 550 11 lysis buffer with protein G sepharose beads 
(Amersham Biosciences) loaded with anti-Mex67 antibody (rabbit, own serum) 
were mixed and incubated at 4°C for 1.5h. The matrix was washed six times with 
lysis buffer and proteins were detected by western blot with the indicated antibodies 
(anti-Mex67 (rabbit, serum) 1:50,000); anti-GST (mouse, Santa Cruz) 1:2,000). The 
anti-Mex67 (and not the anti-GST or anti-His) antibodies were used preferentially 
after generation of a highly specific antibody against Mex67. 

For RNA/DNA competition analysis, co-IP was performed as above, but the 
matrix was washed only three times. Following this, 1 ml lysis buffer, 1 11 Ribolock 
(Thermo Scientific) and the indicated amount of total yeast RNA (prepared from 
cells that were shifted to 42°C for 30 min) or DNA (a 7.3-kb plasmid was digested 
with Banl, resulting in 6 fragments from 94 bp to 3.1 kb) were added and incubated 
for 1h at 4°C. The beads were washed four times with lysis buffer and proteins were 
detected as described above. Signal intensities were quantified using the BiolD 
software. 

ChIP experiments. For ChIP experiments, cells were grown at 25°C to ODg00 of 
0.8. Cultures were split into two equal parts and either shifted to 42°C or further 
incubated at 25°C for 20 min, crosslinked with 1% formaldehyde for 20 min and 
quenched with 250 mM glycine for 5 min. Cells were collected and washed three 
times with TBS (20 mM Tris-HCl, pH 7.5, 150mM NaC]). Pellets were lysed with 
1x pellet volume of glass beads and 3 x pellet volume of ChIP lysis buffer (50 mM 
HEPES pH 7.5, 140mM NaCl, 1% Triton X-100, 0.1% sodium-deoxycholate, 
1mM EDTA, 0.1% SDS, 1mM PMSF) for 30s at 5.5 m/s two times using 
the FastPrep machine. The lysates were sonicated in a water bath sonicator at 
100% duty for 2.5 min to obtain a ~350-bp DNA fragment size. A total of 10% 
of the lysate was collected and used as an immunoprecipitation input control. 
Immunoprecipitation with cleared lysate was performed using anti-Myc (Santa 
Cruz) and anti-HA antibodies (Santa Cruz) and G-Sepharose beads (GE) or with 
GFP trap beads (Chromotek) for 3h at 4°C. Beads were washed twice with lysis 
buffer, one time with high-salt buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 
1% TritonX-100, 0.1% sodium deoxycholate, 1mM EDTA), deoxycholate wash 
buffer (10 mM Tris pH 8.0, 250 mM LiCl, 0.5% NP-40, 0.5% sodium deoxycho- 
late, 1 mM EDTA) and twice with TE buffer pH 8.0 (10 mM Tris, 1mM EDTA). 
Immunoprecipitation eluates and input samples were treated with proteinase K for 
1h at 42°C and crosslinks were reversed by overnight incubation at 65°C. DNA 
was purified via phenol/chloroform extraction. qRT-PCR analyses with input and 
immunoprecipitation samples were performed on a Rotor-Gene Q cycler to analyse 
binding at the 5’-regions of the housekeeping gene HEM15 and the stress-regulated 
HSP12 gene. As a control, primers for a non-transcribed region (NTR) of chro- 
mosome V were used and AC, values were calculated from respective input and 
immunoprecipitation samples. Standard curves were used to determine primer 
efficiencies. Occupancies were calculated relative to the NTR control, according 
to (EA(AGPes — ACN 8)NTRY EA(AC Pes: — ACN°*8)S°! (Pos., tagged strains; 
GOL, gene of interest). 

Cytoplasmic fractionation and RNA splicing analysis. For detection of 
unspliced mRNAs in the cytoplasm (Fig. 4c), cells were grown to log phase in 
selective medium. After collection, cells were washed once with 1 ml YPD/1M 
sorbitol/2 mM DTT and resuspended in YPD/1 M sorbitol/1 mM DTT. Cells were 
spheroblasted using Zymolyase and diluted in 50 ml YPD/1 M sorbitol to recover 
for 30 min before shift to 42°C for 30 min. Cells were put on ice, centrifuged at 
900g for 5 min and resuspended in 500 1l Ficoll buffer (18% Ficoll 400, 10 mM 
HEPES pH 6.0). Cells were lysed by addition of 1 ml buffer A (50 mM NaCl, 1mM 
MgCl, 10mM HEPES pH 6.0). The suspension was mixed and centrifuged at 
1,500g for 15 min. The supernatant was used for cytoplasmic analyses. To verify 
correct fractionation of the cytoplasmic lysates, samples were analysed by western 
blot for the presence of the cytoplasmic Zwfl and nucleolar Nop1. RNA was puri- 
fied using a purification kit (Macherey & Nagel). Equal amounts were separated 
on a 1% agarose gel (1x MOPS, 2% formaldehyde) and blotted overnight on a 
Hybond-Nylon membrane. The membrane was processed as described for the 
dot blot experiments. For detection of the GFP-containing mRNA a DIG-labelled 
probe was prepared as described for FISH analyses. 

Quantification. All experiments shown in this work were performed at least three 
times independently as biological replicates, with the exception of the genome- 
wide analyses. Error bars represent the standard deviation. P values shown in 


Figs la, 2a, b, e and Extended Data Figs 3c, 4h, j were calculated using a two-tailed, 
two-sample unequal variance t-test. P values shown in Fig. 3a and Extended Data 
Fig. 5b c were calculated by a two-tailed, two-sample equal variance test. P values 
are indicated as follows: ***P < 0.001, **P< 0.01, *P < 0.05. For quantification of 
cells with displayed phenotypes (Figs 2c, 4b and Extended Data Figs 5b, c, 6a and 
7d), a minimum of 20 cells was counted for each experiment. Fluorescent signal 
intensities were analysed using the Fiji software. The Hoechst signal indicated the 
nuclear area. Increased nuclear signal for Cy3 indicated nuclear accumulation of 
the analysed RNA and was counted as such. For intensity analyses, this area was 
measured for all RNA-probes on a single plane to get the intensity of the nuclear 
signal and this signal was set into relation with the whole-cell signal to determine 
the cytoplasmic signal. 

Data availability. All datasets generated and analysed during the current study 
are available in Supplementary Information Fig. 1 and from the corresponding 
author upon reasonable request. All microarray and RIP-seq data that support the 
findings of the study have been deposited at the NCBI gene expression omnibus 
(GEO; www.ncbi.nlm.nih.gov/geo/) with the GEO accession numbers GSE83267 
(Microarray) and GSE81542 (RNA-seq.). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Mex67 dissociates from poly(A) RNA 
together with the adaptor proteins, which are dispensable for heat- 
shock mRNA export. a, Stress leads to the dissociation of Mex67 

and its adaptor proteins from bulk mRNAs. Example experiments of 
immunoprecipitations from Fig. 1a with the indicated GFP-tagged or 
Myc-tagged proteins are shown in western blots. The co-precipitated 
poly(A)*RNA was detected in northern blots with a DIG-labelled 

oligo d(T)so probe. b, Gbp2 precipitates after extended heat stress. Western 
blot analysis for the presence of Gbp2 in the soluble supernatant fraction 
or the insoluble pellet fraction is shown. c, Mex67 dissociates from mRNA, 
but not from its adaptor protein. Immunoprecipitation analysis of Mex67 
and the co-precipitated NpI3 is shown in western blots. Mex67-bound 
poly(A)*RNA was analysed in northern blots. d, FISH experiments of 

the SSA4 heat-shock mRNA reveal export defects in mutants of MEX67- 
MTR2, but not the Mex67 adaptor proteins, upon a 30-min temperature 
shift to 42°C. A DIG-labelled SSA4 RNA probe (green) detects the 


heat-shock mRNA in the indicated strains. DNA was stained with Hoechst 
(blue) and poly(A)*RNA with an oligo d(T)s9 probe (red). e, Overview 
of the experiment shown in d. The frame indicates the enlarged single 
cells in d. f, Control experiment that shows the experiment described 

in d at 25°C, in which heat-shock RNAs are not produced in visible 
amounts. g, The experiment was performed as described in d, with the 
cells shifted to 37°C for 30 min. h, The same experiment as described 

in d with a different heat-shock RNA probe against HSP12. i, The same 
experiment as in d with probes against a single mRNA, GPM1 (Cy3, red) 
and poly(A)* (Atto488, green). j, Control experiment that shows the 
experiment described in i without the DIG-labelled probe (green) or a 
Cy3 probe (red). The circle indicates the contour of the cells. k, Deletion 
of all three SR-protein genes is lethal to cells. The indicated strains were 
spotted in serial dilution onto agar plates selecting either for a covering 
plasmid (plates lacking uracil, -URA) or for the loss of this plasmid 
(5-fluoroorotic acid plates, FOA). 
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released upon heat shock. b, The top 50 transcripts from which indicated 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | The loop-domain of Mex67 is important for 
binding and export of heat-shock mRNAs. a, b, Western blot showing 
purified recombinant Hiss—Mtr2 and the co-precipitated recombinant 
Mex67 with Mex67-specific antibodies (top) and anti-His antibodies 
(bottom) used for Fig. 2a, b (a and b, respectively). c, Scheme of the 
heterodimer Mex67-Mtr2 which indicates the loop domain that is mutated 
in the experiments shown in Fig. 2 (adapted from refs 9, 48). d, The loop 
mutants of Mex67 are not degraded in vivo. Western blots are shown 
from cells cultivated at 25°C and from cells that were shifted to 42°C for 
30 min. e, f, FISH experiments reveal defects in the export of the heat- 
shock mRNAs SSA4 (e) or HSP12 (f) (green) in mutants of MEX67 with 
defects in the loop-domain upon 30-min heat shock at 42 °C. The framed 


areas in e indicate the enlarged cells shown in Fig. 2c. g, FISH experiments 
as performed in e and f, with probes against the single housekeeping 
mRNA GPM 1, show accumulation in all strains. h, Quantification of four 
different experiments shown in Fig. 2d. i, In vitro interaction study of 
Npl3 and the indicated Mex67 proteins. Recombinant Mex67 or mutant 
mex67 were incubated with recombinant GST-Npl3. Subsequent co- 
immunoprecipitations of Npl3 with Mex67 are shown. j, Quantification of 
five different experiments shown in i. k, Increasing amounts of RNA, but 
not DNA, substantially reduce the interaction of Npl3 and Mex67. |, Full 
version of the western blot shown in Fig. 3b. The blot was cut into three 
pieces (black frame) and probed with the indicated antibodies. The green 
frame indicates the parts that are shown in Fig. 3b. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Mex67 adaptor proteins have no mRNA 
export defects when deleted, but induce defects when overexpressed or 
mutated. a, c, Deletion of the adaptor proteins results in no mRNA export 
defects and the nuclear accumulation seen in rrp6A cells alone is reduced 
when adaptor proteins are also deleted. Cells were grown at 30°C (a) or 

at 25°C and shifted to 37 °C for 3 h (c) before FISH experiments with an 
oligo d(T)so probe (red, top), with a Cy3-labelled probe against an ADH1 
promoter driven GFP RNA and the U3 snoRNA (red, bottom) were carried 
out (a). DNA was stained with Hoechst (blue). Framed areas reflect the 
enlarged cells. b, Quantification of FISH experiments shown in Fig. 4a. 

d, The deletion of NAB2 is lethal, but the simultaneous deletion of RRP6 
supresses lethality and allows growth at 25°C. The indicated strains are 
shown in tenfold serial dilution on plates that retain a covering plasmid 

or an empty vector (—URA) and on plates that select for the loss of the 


covering wild-type plasmid (FOA). e, The combination of rrp6A mex67-5 
is synthetically lethal. Growth of the indicated strains with (—URA) 

or without (FOA) a covering plasmid is shown. f, Overexpression of 
mex67-5 in the indicated rrp6A adaptor proteinA double mutants 
decreases the leakage of poly(A)*RNA to the cytoplasm as shown by 
FISH experiments. g, Overexpression of quality control factors from the 
strong GALI promoter is toxic to cells. The indicated strains are shown 
in tenfold serial dilution on the indicated plates. h, Overexpression of 
quality control factors lead to mRNA export defects and identifies them 
as mRNA retention factors. FISH with an oligo d(T)s59 probe is shown in 
the indicated strains. i, Mutation in NPL3 is dominant and toxic to cells, 
as seen in serial drop tests. j, Mutation in NPL3 is dominant and causes 
mRNA export defects as shown in FISH experiments with an oligo d(T)5o 
probe. 
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Extended Data Figure 6 | Heat-shock transcripts are not retained in 
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. b, c, FISH experiments with DIG-labelled 


the nuclei of cells defective in nuclear quality control factors. a, FISH SSA4 and HSP12 probes (green) are shown in the indicated strains that 
experiments with a DIG-labelled SSA4 probe (green) and an oligo d(T)so had been shifted to 42°C for 30 min (b) or 1h (c). The frames in b indicate 
probe (red) are shown in the indicated strains. Nuclear accumulation of the enlarged cells that are also shown in a. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Transcripts expressed from heat-shock 
promoters are not quality-controlled. a, Representative expression 
analysis upon heat stress via (RT-PCR. Expression of GPM1 and HSP12, 
controlled by the indicated promoters, is shown upon 30-min heat stress 
at 42°C. The endogenous SSA1 RNA served as a positive control. b, DNA 
staining of the cells shown in Fig. 4b. c, Experiment as shown in a but with 
CYC1 and HSP12 transcripts. d, Promoter-dependent accumulation of 
transcripts in quality-control-factor mutants. All strains carrying plasmids 
with the indicated gene constructs were analysed in FISH experiments 
with oligonucleotides targeting GFP mRNA (red) and poly(A)* RNA 
(green). The DNA is stained with Hoechst (blue). Examples of typical cells 
(top) and quantification of the nuclear accumulation (bottom) is shown. 
e, The intron containing RPL23B RNA is expressed upon heat stress, when 
driven from the HSP12 promoter. qRT-PCR was carried out using lysates 
of log-phase cells that were subjected to 42 °C for 30 min. f, Schematic 
model of the retention of regular mRNAs under stress and the selective 
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export of heat-shock mRNAs to the cytoplasm. Under normal conditions, 
adaptor proteins control the maturation of the transcripts and prevent 
the early association of Mex67 with Mtr2, avoiding premature nuclear 
export. Upon maturation, the adaptor proteins recruit Mex67-Mtr2, 
which allows subsequent nuclear export. Proper Mex67-Mtr2 coverage 
of the adaptor proteins is controlled at the NPC by the gatekeeper Mlp1 
(top). During stress (bottom), regular mRNAs are retained in the nucleus 
by dissociation of Mex67-Mtr2 through its adaptor proteins. By contrast, 
heat-shock mRNAs circumvent an adaptor-mediated quality control by 
instant loading of Mex67-Mtr2 through Hsf1, resulting in an immediate 
nuclear export. Proper RNP formation is not controlled by Mlp1 at the 
NPC because MIp1 is detached and accumulates in nuclear foci. This 
mechanism allows a quick switch between the controlled export of correct 
mRNAs and the immediate and uncontrolled export of stress-specific 
transcripts. 
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Extended Data Table 1 | Yeast strains used in this study 


Number Genotype Source 
HKY35 MATa ura3-52 leu2A1 trp1A63 28 
HKY36 MATa. ura3-52 leu2A1 his3-200 i 
HKY56 MATa ura3-52 leu241 rip1::HIS3 this study 
HKY 124 MATa. ura3-52 leu2A1 his3A200 rat7-1 id 
HKY157 MATa ura3 leu2 his3 ade npl3::HIS3 20 
+ p CEN URA3 NPL3-myc 
HKY 168 MATa ura3 leu2 trp! his3 lys2 ade2 ade8 ghp2::HIS He 
HKY194 MATa ura3 leu2 trp! his3 lys2 ade2 ade3 hrb1::HIS3 e 
HKY298 MATa. ura3A0 leu2A0 his3A1 lys2A0 hrb1::kanMX4 Euroscarf 
HKY307 MATa ura3 leu2 his3 ade2 trp] yral::HIS3 31 
+ p CEN URA3 YRAI 
HKY369 MATa ura3A0 leu2A0 his3A1 lys2A40 gbp2::kanMX4 Euroscarf 
HKY380 MATa ura3A0 leu2A0 his3A1 met] 5A0 npl3::kanMXx4 Euroscarf 
HKY381 MATa ura3A0 leu2A0 his3A1 lys2A0 Euroscarf 
HKY428 MATa. ura3-52 leu2A1 his34200 mtr4-G677D a 
HKY642 MATa ura3 leu2 trp! his3 ade2 mex67::HIS3 2 
+ p CEN URA3 MEX67 
HKY643 MATa. ura3 leu2 trp! his3 ade2 mex67::HIS3 a 
+ p CEN URA3 MEX67 
HKY644 MATa. ura3 leu2 trp! his3 ade2 mex67::HIS3 2 
+ p CEN LEU2 mex67-5 
HKY682 MATa. ura3A0 leuA0 his341 npl3::kanMX4 ~ 
HKY719 MATa his3A1 leu2A0 ura3A0 npl3::kanMX4 ghp2::kanMX4 hrb1::kanMX4 this study 
+ p CEN URA3 NPL3 
HK Y820 MATa ura3-52 leu2A1 his3A200 GBP2-3xmyc: HIS3 iy 
HKY821 MATa ura3-52 leu2A1 his3A200 HRB1-3xmyc: HIS3 ur 
HKY891 MATa ura3 leu2 his3 lys2 mtr2::kanMX4 33 
+ p CEN TRPI mtr2-21 + p CEN URA3 MTR2 
HKY892 MATa ura3 leu2 his3 lys2 mtr2::kanMX4 Es 
+ p CEN TRPI mtr2-33 + p CEN URA3 MTR2 
HKY907 MATa ura3 leu2 his3 mtr2::kanMX4 trp1::kanMX4 a 
+ p CEN TRPI mtr2-21 
HKY931 MATa ura3A0 leu2A0 his3A1 trp1::kanMX4 
HKY 1028 MATa. leu2A0 his3A1 lys2A0 ura3A0 rrp6::kanMX4 Euroscarf 
HKY1115 MATa ura3 leu2A nab2::HIS3 + p CEN URA3 nab2AN =e 
HKY 1203 MATa. his3A1 leu2A0 met15A0 ura3A0 lys2A0 hrb1::HIS3 rrp6::kanMX4 ui 
HKY 1204 MATa. his3A1 leu2A0 met15A0 ura3A0 ade2 ade8 trp1 gbp2::HIS3 rrp6::kanMX4 a 
+ p CEN URA3 GBP2-GFP 
HKY 1263 MATa leu2A0 his3A1 ura3A0 met] 5A0 ssa4::kanMX4 Euroscarf 
HKY 1266 MATa ura3A0 leu2A0 his3A met]5A0 MEX67-GFP:HIS3MX6 Invitrogen 
HKY1310 MATa leu2A0 his3A1 lys2A0 ura3A0 rrp6::kanMX4 npl3::kanMX4 this study 
HKY 1345 MATa ura3A0 leu2A0 his3A met]5A0 ADH5-GFP:HIS3MX6 Invitrogen 
HKY1409 MATa leu2A0 his3A1 ura3A0 met15A0 mip] ::kanMX4 Euroscarf 
HKY 1457 MATa ura3A0 leu2A0 his3A met] 5A0 HPR1-GFP:HIS3MX6 Invitrogen 
HKY 1458 MATa ura3A0 leu2A0 his3A met] 5A0 YRA1-GFP:HIS3MX6 Invitrogen 
HKY 1475 MATa ura3A0 leu2A0 his3A met] 5A0 NAB2-GFP:HIS3MX6 Invitrogen 
HKY1506 MATa ura3 leu2A his3A1 nab2::HIS3 rrp6::kanMX4 this study 
+p CEN URA3 nab2AN 
HKY 1568 MATa ura3A0 leu2A0 his3A met] 5A0 CYCI-GFP:HIS3MX6 Invitrogen 
HKY1569 MATa ura3 leu2 trp1 his3 ade2 mex67::HIS3 this study 
+ p CEN LEU2 mex67A409-435aa 
HKY1570 MATa ura3 leu2 trp] his3 ade2 mex67::HIS3 this study 
+ p CEN LEU2 mex67KR>AA 
HKY 1571 MATa ura3A0 leu2A0 rrp6::kanMX4 CYC1-GFP:HIS3MX6 this study 
HKY1572 MATa ura3A0 leu2A0 mtr4-G677D CYC1-GFP: HIS3MX6 this study 
HKY1580 MATa ura3A0 leu2A0 his3A met15A0 HSF1-GFP:HIS3MX6 Invitrogen 
HKY 1640 MATa mex67::HIS3 CYC1-GFP:HIS3MX6 ura3A0 leu2A0 
+ p CEN LEU2 mex67-5 + p CEN URA3 Peyc): CYC1-GFP. this study 
HKY 1641 MATa mtr2::kanMX4 CYC1-GFP:HIS3MX6 trp1::kanMX4 ura3A0 leu2A0 
+ p CEN TRPI mtr2-21+ p CEN URA3 Pcyci: CYC1-GFP this study 
HKY 1689 MATa rrp6::kanMX4 mex67::HIS3 
+p CEN LEU2 mex67-5 + p CEN URA MEX67 this study 
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Extended Data Table 2 | Oligonucleotides used in this study 


Number 


HK818 

HK819 

HK1002 
HK1003 
HK1066 
HK1067 
HK1451 
HK1452 
HK1473 
HK1474 
HK1475 
HK1476 
HK1477 
HK1478 
HK1494 
HK1495 
HK1509 
HK1510 
HK1511 
HK1512 
HK1513 
HK1514 
HK1598 
HK1599 
HK1921 
HK1922 
HK1955 
HK1956 
HK2040 
HK2041 
HK2055 
HK2098 
HK2099 
HK2128 
HK2129 
HK2134 
HK2135 
HK2138 
HK2139 
HK2150 
HK2151 
HK2154 
HK2155 
HK2170 
HK2171 
HK2226 
HK2227 
HK2288 
HK2354 
HK2355 
HK2356 
HK2357 
HK2360 
HK2361 
HK2370 
HK2371 
HK2380 
HK2381 
HK2429 
HK2439 
HK2433 
HK2434 
HK2446 
HK2447 


Sequence 


5'-CCACTATAACACCCGATGAGG-3' 

5'-GGCAAACTAAGGTTACAGTTTAGC-3' 

5'-TGCTAAGGCTGTCGGTAAGG-3' 

5'-TCAGAGGAGACAACGGCATC-3' 

5'-GGTTGAACGGTTCTTGTATGGC-3' 

5'-CCAAAGAACCTAGACCACCAGC-3' 

5'-AGGCTCGTAGCGGTTCTGAC-3' 

5'-CGAGCTTCTGCTATCCTGAGGG-3' 

5'-CACCTACGCTGACAACCAACC-3' 

5'-CATCCTCTTCACCCACCTTCTCC-3' 

5'-CAAGGATAACGCTGAAGGTCAAGG-3' 

5'-CTTCTTGGTTGGGTCTTCTTCACC-3’ 

5'-GTTGAAAGTCGTGGTTCCTGGTG-3' 

5'-CTGCTCTCCTTGACCTTGACCTTG-3' 

5'-AAAGCACCGTTTCCCGTCC-3' 

5'-CACTACACTACTCGGTCAGGCT-3' 

5'-CGGGCTCTGGAAGAATGTGTTG-3' 

5'-GAGGACCGAAGGAGGAGTAAGC-3' 

5'-GTCTTCCTCCGCTCAAACTTCC-3' 

5'-GAACAGCAGCACCGTAAGCAAC-3' 

5'-GCCCAATGGCAAGAAACCAAAC-3' 

5'-CCATCGTTGCCTTATCGTCCTC-3' 

5'-GGCCCCAGGTAAGAAAGTCG-3' 

5'-GAAGGTTTCGGCAGCGGTG-3' 

5'-AAGATGGCGTGAAGAAGGCA-3' 

5'-CCGCAACCTGTCAGAGACAA-3' 

5’-GGTGAGCCAGGTATCGGTAAGAC-3’ 

5’-CCGATGACCTTCAATTGGCCTC-3’ 
5’-Cy3-CCATTAACATCACCATCTAATTCAACAAGAATTGGGACAACTCCAGTGAA-Cy3-3’ 
5’-Cy3-CTTGACTTCAGCACGTGTCTTGTAGTTCCCGTCATCTTTGAAAAATATAG-Cy3-3’ 
5’-Cy3-TAAAAGGACAGGGCCATCGCCAATTGGAGTATTTTGTTGATAATGGTCTGCT-Cy3-3’ 
5'-CCGGGCAGAACATTCTAGAAAGC-3' 

5'-CCGGGCTTTCTAGAATGTTCTGC -3' 
5'-Cy3-TTCTTTAAAATCAATACCTTTTAACTCGATTCTATTAACAAGGGTATCAC-Cy3-3' 
5'-Cy3-TCCGGGTATCTTGAAAAGCACTGAACACCATAAGTGAAAGTAGTGACAAG-Cy3-3' 
5'-ATGCCCGAAGGTTATGTACAGG-3' 

5'-CATTCTTTTGTTTGTCTGCCATG-3' 

5’-GGTGCCTTCAACATTACTAGACTGC-3' 

5-GATATGTATATGCGCAAGAAGGTGC-3' 

5'-CGCAGGTAGAAAAGGATTCGG-3' 

5'-CCTTTTTCGGCAGAGTCG-3' 

5'-CCAGAACAATCCGTACACAAGG-3' 

5'-GCAATTGTCTTCTGATACTTAGCAC-3' 
5'-Cy3-CTCTTTTCGTTGGGATCTTTCGAAAGGGCAGATTGTGTGGACAGGTAATG-Cy3-3' 
5'-Cy3-CCTGTACATAACCTTCGGGCATGGCACTCTTGAAAAAGTCATGCCGTTTC-Cy3-3' 
5'- GGTTGGCAACAGCAGCGGCACCAGCAGCGGCAGCTTCTGGGTCCAAGTAG-Cy3-3' 
5'-CCAATGGAATACCAGTTGGGATGTTCAACTTAGCAATGTCAGCATCAGAG-Cy3-3' 
5'-CY3-ATTCAGTGGCTCTTTTGAAGAGTCAAAGAGTGACGATTCCTATAGAAATGA-3' 
5'-GGAGAACAAGGGCTCCAGATTG-3' 

5'- CCAGTAGTACTAGCGGCTAACTCG-3' 

5'-GGACGGATCGGCATTGCC-3' 

5'-CACCTGAGATTCGGGCACC-3' 

5'-GGTGTTCCACCATATGAAGGCAC-3' 

5'-GGTAAGTCCATTCTCGTTTCAGGC-3' 

5'-GGTCAAAGACGCTGAATCTGC-3' 

5'-CGACATCAGCATCCCTTGACTG-3' 

5'-CGATAATGATGACAGCAGCAGC-3' 

5'-GGTTAATCTCAACTTCGAGTGACC-3' 

5'-GGACAAGCTTTTATCCGTTGACG-3' 

5'-CCATCTGCATATTTTGAGTAAATATTCGG-3' 

5'-GGAGCAGGAAGAAATGATGGC-3' 

5'-GGTTCTGATGGTTATTGGTCTTGC-3' 

5'-CAGAGATGTACGACTATCGC-3' 

5'-CATGCCGTACCGGATTG-3' 
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Name 


IDP3 forward 
IDP3 reverse 
TDH1 forward 
TDHI reverse 
ADH] forward 
ADH] reverse 
25S rRNA forward 
25S rRNA reverse 
SSA4 forward 
SSA4 reverse 
HSP12 forward 
HSP12 reverse 
HSP26 forward 
HSP26 reverse 
5S rRNA forward 
5S rRNA reverse 
GRE3 forward 
GRE3 reverse 
SSA1 forward 
SSA/ reverse 
EDC2 forward 
EDC2 reverse 
RPL8A forward 
RPL8A reverse 
HEM15 forward 
HEM15 reverse 
HSP104 forward 
HSP 104 reverse 
GFP reverse 
GFP reverse 
GFP reverse 
HSE forward 
HSE reverse 
GFP reverse 
GFP reverse 
GFP forward 
GFP reverse 
NTR Chr.V forward 
NTR Chr.V reverse 
HSP12 forward 
HSP12 reverse 
HEM15 forward 
HEMI1S reverse 
GFP reverse 
GFP reverse 
GPM1 reverse 
GPM1 reverse 
U3 snoRNA 
HSP30 forward 
HSP30 reverse 
AHA] forward 
AHA] reverse 
HSP42 forward 
HSP42 reverse 
SGT2 forward 
SGT2 reverse 
CTH] forward 
CTH] reverse 
ERB! forward 
ERB] reverse 
NOP2 forward 
NOP? reverse 
PIB1 forward 
PIB1 reverse 
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Extended Data Table 3 | Plasmids used in this study 


Number Features Source 
pHK12 CEN URA3 P 4p1; NLS-NES-GFP-GFP a 

pHK20 CEN LEU2 MEX67-GFP 32 

pHK79 CEN URA3 Pga11 this study 
pHK80 CEN LEU P¢au1 this study 
pHK87 CEN LEU2 38 

pHK88 CEN URA3 38 
pHK103 2u LEU2 = 
pHK104 2u URA3 sis 
pHK154 CEN LEU? npl3-17 bia 
pHK144 2u URA Poi; GFP-NPL3 a 
pHK291 CEN LEU? mex67-5-ProtA a 
pHK358 CEN LEU? P41 -NAB2-GFP 
pHK367 CEN URA3 GBP2-GFP 5 
pHK422 2u URA P¢a1; GBP2-GFP 24 
pHK418 CEN LEU2 GFP-NPL3 i 
pHK439 GST 

pHK442 GST-MEX67 ie 
pHK477 CEN TRP yral-1 a 
pHKS531 CEN LEU2 RPBI-HA 36 
pHK765 CEN URA3 GFP-NPL3 u 
pHK778 CEN LEU2 9xmyc-NPL3 a 
pHK779 CEN URA3 9xmyc-NPL3 i 
pHK892 CEN URA3 P4pPRP17-GFP a 
pHK927 CEN URA3 P4pyHRBI-GFP 
pHK1221 2u URA Poi) MLP1 % 
pHK1276 GST-NPL3 2 
pHK1279 P >: HIS: MTR2 this study 
pHK1372 Prec: HIS: TEV: MTR2:MEX67 4 

pHK 1373 P rpc: HIS: TEV: MTR2:mex67loopKR>AA 2 
pHK1374 P rpc: HIS: TEV: MTR2:mex67-409-43 5aaK343E ° 
pHK1376 CEN LEU2 mex67A409-435aa 2 
pHK1377 CEN LEU? mex67loopKR>AA (K415A, K416A, K419A, K424A, R426A, R427A) $ 
pHK1443 CEN URA3 Peyc1: CYCI-GFP this study 
pHK1444 CEN URA3 Pyspj2:CYC1-GFP this study 
pHK1445 CEN URA3 Pyspi2: HSP12-GFP this study 
pHK1464 CEN URA3 Peycr suse: CYC1-GFP this study 
pHK1470 CEN URA3 Pepw:GPMI-GFP this study 
pHK1472 CEN URA3 Pyspi2:GPM1-GFP this study 
pHK1517 CEN URA3 Peyc):HSP12-GFP this study 
pHK1518 CEN URA3 Pepi: HSP12-GFP this study 
pHK1547 CEN URA3 Pyspi2:RPL23B-GFP this study 
pHK1548 CEN URA3 Ppp1239:RPL23B-GFP this study 
pHK1553 CEN URA3 Pepa -nse:GPM1-GFP this study 
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Near-atomic-resolution cryo-EM analysis of the 
Salmonella T3S injectisome basal body 


L. J. Worrall!*, C. Hong**, M. Vuckovic!, W. Deng®, J. R. C. Bergeron!+, D. D Majewski', R. K. Huang’, T. Spreter!+, B. B. Finlay®, 


Z. Yu? & N.C. J. Strynadka! 


The type III secretion (T3S) injectisome is a specialized protein 
nanomachine that is critical for the pathogenicity of many 
Gram-negative bacteria, including purveyors of plague, typhoid 
fever, whooping cough, sexually transmitted infections and 
major nosocomial infections. This syringe-shaped 3.5-MDa 
macromolecular assembly spans both bacterial membranes and 
that of the infected host cell. The internal channel formed by the 
injectisome allows for the direct delivery of partially unfolded 
virulence effectors into the host cytoplasm’. The structural 
foundation of the injectisome is the basal body, a molecular lock- 
nut structure composed predominantly of three proteins that form 
highly oligomerized concentric rings spanning the inner and outer 
membranes” °. Here we present the structure of the prototypical 
Salmonella enterica serovar Typhimurium pathogenicity 
island 1 basal body, determined using single-particle cryo-electron 
microscopy, with the inner-membrane-ring and outer-membrane- 
ring oligomers defined at 4.3 A and 3.6 A resolution, respectively. 
This work presents the first, to our knowledge, high-resolution 
structural characterization of the major components of the basal 
body in the assembled state, including that of the widespread class 
of outer-membrane portals known as secretins. 

Using single-particle cryo-electron microscopy (cryo-EM), we deter- 
mined the structure of a secretion-incompetent basal body mutant, 
lacking the N-terminal cytoplasmic domain of PrgH? (PrgHj 39-392), at 
a resolution of 6.3 A (Fig. 1a and Extended Data Fig. 1a—d, g). In our 
structure, the periplasmic gate is in a closed conformation (Fig. 1c) 
and, by design, no density is attributable to the inner rod or needle, 
neither of which was detected by liquid chromatography-tandem 
mass spectrometry (LC-MS/MS) (Extended Data Fig. 2a). The inner- 
membrane rings (Salmonella pathogenicity island 1 (SPI-1) proteins 
PrgH and PrgK) have the best local resolution, whereas the export 
apparatus components they encompass and the outer-membrane pore 
(InvG, a member of the secretin family that is also found in the type II 
secretion system (T2SS), type IV pilus system and the phage-assembly 
system®) are of lower resolution (Extended Data Fig. li). Imposing 24-fold 
symmetry*’” improved the global resolution to 4.3 A (Extended 
Data Fig. le, f, h, j), resolving side chains for the inner-membrane 
rings (Fig. 1b and Extended Data Fig. 1k) and allowing for models 
of PrgHj71-364 and PrgK59_293 to be built. To probe the structure of 
the InvG secretin further, we purified the intact protein alone’, with 
cryo-EM analysis giving an unsymmetrized map of 4.4 A resolution 
(Extended Data Fig. 3a-e) and of 3.6 A with clear 15-fold symmetry 
imposed (this allowed the structurally uncharacterized N3, secre- 
tin and S domains (InvGj72-557) to be built; Fig. 1b and Extended 
Data Fig. 3f-i). Superimposition with the basal body map confirms 
highly similar overall features (Fig. 1c), indicating that no major 
conformational artefacts were introduced during secretin isola- 
tion. In the absence of the inner-membrane rings, the terminal NO 


and N1 domains of InvG were not resolved; we therefore employed 
Rosetta electron-microscopy-guided symmetrical modelling” 
to position the previously determined X-ray structure’ in the 6.3A 
basal body map (Fig. 1d). 

The inner-membrane proteins PrgH and PrgK form concentric rings 
(Fig. 1d), consistent with our earlier Rosetta electron-microscopy- 
guided models” * but with more closely packed homo- and hetero- 
oligomerization interfaces (Extended Data Fig. 2b-e). Notably, the 
uninterrupted sequence of PrgK»9-293 is observed here, including two 
previously unresolved regions that we now understand to be central 
to the assembly of PrgH and PrgK. First, the linker connecting the D1 
and D2 domains (Gln76-—Pro98) packs between neighbouring PrgK 
D2 domain helices, with conserved residue Phe89 (the mutation of 
which abrogates self-association*) inserted in a zipper-like fashion 
around the assembled ring (Extended Data Fig. 2d). Second, the loop 
connecting the Prgk D2 domain and the C-terminal transmembrane 
helix forms extensive interactions at the interface of neighbouring PrgK 
and PrgH monomers (Extended Data Fig. 2e), a finding supported by 
earlier cross-linking data’. A region of this loop and the C-terminal 
transmembrane helix are absent in some bacterial variants, such as that 
of the enteropathogenic Escherichia coli (EPEC) protein EscJ (termi- 
nates at the PrgKj95 equivalent). Although it still allows for secretion, a 
PrgKj_299 mutant destabilizes the needle complex’, suggesting that this 
region contributes to a more robust assembly, one that may be required 
depending on the particular environmental niche of the pathogen. 

Localized within the bacterial outer membrane, secretins are large 
portals to the extracellular environment essential in four distinct bac- 
terial secretion systems®. The structure of InvGj72_557 is a massive 
double-layered }-barrel (Fig. 2 and Extended Data Fig. 4). The central 
secretin domain (residues 302-518) forms an eight-stranded 3-sandwich 
that is splayed apart at the extracellular end (Fig. 2c) and is highly 
conserved at the packed core (Extended Data Figs 5, 6). The outer 
3-sheet (strands 1, 3, 8 and 9) forms a 60-stranded anti-parallel 
3-barrel (with a shear of 60) that constitutes the outer wall; the inner sheet 
(strands 4-7) forms a kinked anti-parallel barrel (strand 60; shear 0) 
that both buttresses the outer wall and forms the inner ‘periplasmic 
gate (Fig. 2b, c). At the extracellular face of the outer wall, strands 
1 and 2 form a hairpin that, along with a kinked strand 3, constitutes an 
angled 45-stranded amphipathic }-barrel ‘lip, decorated on the exterior 
by a highly conserved amphipathic helical loop that connects strands 
8 and 9 (Extended Data Figs 5b, 6). The preceding N3 domain abuts 
the periplasmic base of the secretin domain. It belongs to the family 
of small mixed «/3-modular domains that we previously termed 
ring-building motifs (RBMs)*, common to proteins of the injectisome? 
and other secretion systems’. These domains share an overall superfi- 
cial architecture (divided into two groups based on secondary structure 
connectivity) and, on the basis of prior modelling, a broadly conserved 
ring-packing arrangement is predicted’. The structure of the 15-mer 
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15x averaged isolated secretin 


Periplasmic 
constriction 


Disordered transmembrane ai 
helices and lipidation 2 


Figure 1 | Cryo-EM structures of the injectisome basal body and 
isolated secretin. a, Reconstruction of the PrgHj 39-392 basal body (C1) 

at 6.3 A resolution. b, Reconstructions of the 15-fold averaged isolated 
secretin (top: blue; 3.6 A resolution) and the 24-fold averaged inner- 
membrane rings of the basal body map (bottom: PrgH, green; Prgk, 
orange; 4.3 A resolution). c, Central slice view of basal body reconstruction 
(dark-grey contoured as in a and light-grey contoured at lower level 

to highlight less-ordered features) and isolated secretin (blue). The 
domain annotation of PrgH, PrgK and InvG is overlaid on the left and 

the structures of the monomeric domains previously solved on the right. 
The PrgH cytoplasmic D1 domain (green, bottom left) is not present in 
the PrgH39_392 mutant used in this study and its precise location with 
respect to the basal body is unclear. The transmembrane helices of PrgH 
(N-terminal) and PrgK (C-terminal) and the PrgK N-terminal lipidation 
are present but diffusely ordered. d, Refined structures for InvGj7.-557 
(blue), PrgHj71-364 (green), PrgK29-203 (orange) and Rosetta-modelled 
InvG34_171 (pale blue). One monomer encompassing InvG34_557 is coloured 
according to structural domains: medium blue, NO-N1 domains; cobalt 
blue, N3 domain; cyan, outer 3-sheet; green, inner (}-sheet; orange, 
secretin domain lip; red, S domain (note the displaced interaction with the 
B-sheet of the i+ 1 and i+2 protomers). 


N3 domain here (as well as the 24-mer RBMs of PrgH and PrgK) val- 
idates this oligomerization mechanism and highlights the substantial 
network of interactions that these domains make to the overall assem- 
bly and stability of the injectisome. A 8-hairpin specific to T3S secretin 
N3 domains (residues 193-206) packs against the underside of the 
B-sheets of the periplasmic gate (Fig. 2c). C-terminal to the secretin 
domain, the S domain (residues 526-end) forms a helix-turn-helix 
motif that extends laterally across neighbouring protomers at the outer 
midsection of the secretin pore (Figs 1d, 2). 

The InvG secretin structure unambiguously confirms the pentade- 
cameric stoichiometry previously suggested in 10 A resolution cryo-EM 
maps of the Salmonella SPI-1 injectisome®. Variation in the symmetry 
of other secretins, ranging from 12- to 14-fold, has been proposed, 
but is unexplained by our structure-guided sequence analysis, with the 
double-layered barrel well conserved and most variation involving loop 
regions that have limited involvement in oligomerization (Extended 
Data Fig. 6). Similarly, comparison of previous electron microscopy 
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reconstructions does not reveal a relationship between proposed stoi- 
chiometry and secretin dimensions, with the InvG 15-mer here fitting 
closely to lower-resolution maps from other species (Extended Data 
Fig. 7d, e). 

The InvG structure provides a molecular framework that allows us 
to understand the features that contribute to function of the secretin 
family, including assembly, localization, membrane association and 
gating. It also provides context to the wealth of published biochemical, 
biophysical and mutational in vitro and in vivo experimental data. 

The observed InvG domains contribute extensively to oligomeriza- 
tion, a property that is mediated by a significant hydrophobic interface 
(a solvation free-energy gain (A'G) of —45kcal mol! (ref. 11) anda 
surface area of approximately 4,700 A per monomer), in keeping with 
the highly thermostable and denaturation-resistant signatures of these 
pores in the assembled state!*!°. In addition to the extensive }-strand 
hydrogen-bond network of the central secretin domain (Extended 
Data Fig. 7a), the smaller N3 and S domains also make notable con- 
tributions to oligomerization, and their disposition is suggestive of a 
concerted role in assembly (Extended Data Fig. 7b, c). The N3 RBM 
forms not only multi-valent homo-oligomeric ring contacts (with an 
interface area of ~770 A? per monomer), but also interfaces with the 
8-sandwich of the adjacent secretin domain (~550 A per monomer), 
which explains why it is essential in oligomerization (as opposed to the 
more N-terminal NO, N1 and N2 domains, which have been shown 
to be dispensable for secretin assembly in PulD)™. As validation, 
PulD oligomerization-deficient mutants!* map to both the observed 
N3 self-association and N3-secretin domain interfaces (Extended 
Data Fig. 5d), while Leu293Arg mutation of InvG at the core of the 
N3 ring interface reduces secretion and stability (Extended Data 
Fig. 5f). Notably, the N3 self-association interface is largely hydrophobic 
(A'G= —16.4kcal mol™'), typical of a very stable complex. By com- 
parison, the analogous RBM interface formed by the more peripheral 
InvG N1 domain (AiG = —3.6 kcal mol~') has a more polar, loosely 
associated interface, suggestive of plasticity in basal body assembly. 
The differing numbers of N-terminally disposed domains amongst the 
secretin class (T2SS PulD has an additional module (N2) compared 
to type III secretion system (T3SS) variants, for example), suggest the 
need for a customized periplasmic span that couples each particular 
secretin to the varying inner-membrane components of these systems. 

The C-terminal S domain wraps around the secretin midsection, 
forming interactions with an extended hydrophobic surface on the 
exterior of the 8-barrel that spans over two successive monomers 
(an interface area of ~1,500 A”) ina stabilizing role, akin to that of a molec- 
ular staple (Extended Data Fig. 7b, c). PulD multimerization-deficient 
mutants'° map to this interface (Extended Data Fig. 5d), supporting a 
role for this region in assembly. The S domain is also the site of interac- 
tion with cognate chaperones known as pilotins: small outer-membrane 
targeted lipoproteins that are involved in secretin localization, assembly 
and outer-membrane insertion"®. Pilotins are structurally diverse but 
appear to interact with their respective secretin in a conserved manner, 
as illustrated by the structures of the Shigella flexneri T3S MxiM"” and 
the Dickeya dadantii T2S OutS'® pilotins in complex with secretin S 
domain peptides. In both, a helical structure of the peptide forms only 
upon binding to the hydrophobic surface of the pilotin, a common 
disorder-order transition that, according to our data, also occurs for 
the complex of InvG and the Salmonella pilotin InvH (Extended Data 
Fig. 8c). Furthermore, the high affinity of the secretin-pilotin interaction 
(Extended Data Fig. 8d) is consistent with the pilotin-mediated outer- 
membrane targeting of nvG monomers, potentially with assistance 
from the bacterial Lol lipoprotein-sorting pathway. The InvG S$ domain 
and the S domain peptide of the S. flexneri secretin MxiD share a 
common turn-helix motif (Extended Data Fig. 8a) and superimposition 
localizes the pilotin to the exterior of the InvG multimer (Extended 
Data Fig. 8b). Deletion of this motif (residues 542-562) in InvG results 
in the same secretion-deficient phenotype as removal of the complete 
S domain (residues 521-562) (Extended Data Fig. 8g). Mutation at 
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Figure 2 | Structure of the InvG secretin pore. a—c, InvG}72-557 secretin 
pore structure viewed from the side (a), the top (outer-membrane 
perspective; b) and the monomeric (c) view. Domains are coloured 
according to structural features: blue, N3 oligomerization ring domain; 
cyan, secretin domain outer }-sheet; green, secretin domain inner B-sheet; 
gold, secretin membrane association, AHL and membrane insertion 
domain lip; red, C-terminal pilotin-binding S domain. Secretin domain 
3-strand numbers and functional roles of domains are indicated in c. 


the modelled pilotin interface results in an approximately 500-fold 
reduction in binding affinity of the isolated S domain and InvH, with 
a corresponding diminished secretion phenotype (Extended Data 
Fig. 8d, e, g). Together, the data support a stabilizing role for the pilotin: 
it acts as a molecular ‘stapler’ that mediates the folding of the S domain, 
which is necessary to allow the observed clamping interactions across 
two proceeding secretin protomers (i+ 1 and i+2) during assembly 
(Fig. 1d and Extended Data Fig. 7b). This is consistent with the obser- 
vation that the T2SS pilotin PulS substantially catalyses the initial stages 
of PulD folding'®. Previous in situ tomography analysis of S. flexneri, 
together with this latter work, suggests that the pilotin may stay asso- 
ciated with secretin in the assembled injectisome”° (Extended Data 
Fig. 9d). Collectively, our observed contributions of both the ring-forming 
N3 and staple-like S domains to oligomerization indicate that each has 
akey role in efficient secretin assembly and stability. 

Secretins from several systems!*!”? have been shown to sponta- 
neously insert into the bacterial outer membrane, independent of the 
general 3-barrel assembly machinery (BAM). Biophysical data, includ- 
ing low-resolution electron microscopy of T2S PulD, helped to further 
define this process as step-wise through the outer-membrane attach- 
ment of a ‘pre-pore —that is, a partially ordered oligomer that we can 
now structurally define as being comprised of a stabilized core of the N3, 
secretin and S domains described above but with unstructured distal 
(outer-membrane) secretin domain 6-sheets (Extended Data Fig. 9c)— 
followed by insertion of the folded pore into the membrane'*”?". The 
structure of InvG delineates the structural motifs implicated in these 
distinct membrane association and insertion steps. The only consid- 
erable hydrophobic surface is localized at the outer face of the kinked 
B-barrel lip, with a span and composition consistent with that of a 
membrane-spanning domain (Fig. 3a-c). A diffuse ring of density 
surrounds this region, similar to the detergent/lipid belt around 
transmembrane domains of other proteins imaged by cryo-EM. 
A notable feature of the transmembrane domain is the dramatic angle 
of the 3-strands, which are kinked by approximately 45° relative to 
the outer-membrane surface normal (Fig. 3a). This matches well with 
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Figure 3 | Structural elements mediating secretin membrane 
association and insertion. a, Slab view of the secretin with AHL (gold) 
that we propose to mediate the initial monotopic association with the 
inner leaflet of the outer membrane and the amphipathic kinked (-barrel 
lip (orange) that we propose allows for subsequent full BAM-independent 
insertion of the pore across the bilayer. Aromatic and positively charged 
residues are shown as spheres on monomers to the left. The density we 
attribute to diffusely ordered detergent and lipid in the C1 secretin map 

is shown as a grey surface. b, Solvent-accessible surface of the secretin 
(including NO and N1 domains) is coloured according to residue type: pale 
orange, hydrophobic residues; cyan, polar; blue, positively charged; red, 
negatively charged; green, glycine; light green, proline. c, Close-up view of 
the proposed membrane-spanning region of InvG formed by the kinked 
region of strands 1, 2, 3a (orange; residues 321-358) and the conserved 
AHL that links strands 8 and 9 (gold; residues 483-498). Conserved 
residues from d are labelled, including PulD multimerization mutants 
Tle569Ser (InvG Ile493 equivalent; solid square) and Phe573Leu (InvG 
Phe497 equivalent; solid star). d, Sequence alignment of the conserved 
AHL. T3SS and non-T3SS secretins are highlighted green and orange, 
respectively. MxiD, S. flexneri; YscC, Y. enterocolitica; PscC, P. aeruginosa; 
EscC, EPEC; SpiA, Salmonella SP1-2; GspD, Vibrio cholerae T2SS; PulD, 
Klebsiella oxytoca T2SS; pIV, Phage F1; PilQ, Neisseria meningitidis 

type IV pilus system. Highly conserved residues are labelled and shown 
in c, including PulD multimerization mutants [le569Ser (InvG Ie493 
equivalent; solid square) and Phe573Leu (InvG Phe497 equivalent; solid 
star). e, nvG complementation assay for membrane-association mutants. 
Single alanine mutants in the AHL had no effect on secretion, Phe486Ala 
and Pro491Ala are shown as examples. Mutation of multiple conserved 
and hydrophobic residues in the AHL reduced secretion substantially, as 
did mutation of residues in the transmembrane 6 lip (Val337 and Leu339; 
labelled in c). Lysate InvG protein levels using anti-InvG antibody are 
indicated below the main panel. 


the invaginated nature of the outer membrane in recent in situ tomog- 
raphy studies of various T3SSs*°”° (Extended Data Fig. 9d). Decorating 
the transmembrane lip in our secretin structure is an amphipathic 
helical loop (AHL), the most conserved sequence in the secretin 
family, presenting its hydrophobic face towards the inner leaflet of the 
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outer membrane (Fig. 3a, c, d). Amphipathic helices are commonly 
exploited for membrane association; examples include monotopic 
membrane proteins and peptide antibiotics, in which the hydrophobic 
side chains attach and partially penetrate into a single leaflet of the lipid 
bilayer. Mutants of the transmembrane B-lip and AHL still localize to 
the membrane (the AHL mutant to a lesser extent), a process that is 
probably mediated by the pilotin. Their ability to form SDS-resistant 
oligomers was, however, severely affected, indicating that they had 
a disruptive effect on membrane association and on insertion of the 
secretin pore” (Extended Data Fig. 9a), with a corresponding reduction 
in secretion (Fig. 3e). PulD mutants also map to the AHL (Ile569Ser 
and Phe573Leu; equivalent residues Ile493 and Phe497 of InvG; Fig. 3d 
and Extended Data Figs 5d, 6a) and were shown to affect the mislocal- 
ized insertion in the inner membrane’, multimerization kinetics!* and 
permeability (for PulD(Phe573Leu))”°. Taking these data together, we 
propose a role for AHL-mediated monotopic protein interactions with 
the outer membrane, in which they act as anchors to localize secretin 
monomers and facilitate assembly and oligomerization of the intimately 
associated N3 ring and secretin domain (3-strands of the pre-pore state. 
This in turn provides a stabilized B-strand alignment of the secretin 
domain for the final ‘zipping up’ of the extensively hydrophobic -barrel 
lip, a process that is required for full membrane insertion and span of 
the assembled, hyperstable pore (Extended Data Fig. 9b). The AHL 
could further affect localized physical membrane properties (as shown 
for other amphitropic proteins”’), such as fluidity or localized curva- 
ture. These membrane properties have been purported to be involved 
in PulD membrane insertion!?*. 

A prominent feature of secretins, as imaged by electron microscopy, 
is the periplasmic gate, a barrier necessary for ensuring selective passage 
of solutes. The gate in our structure is formed by the radial projections 
of 3-strands 4 and 5 of the inner 6-sheet, which extend into the secretin 
lumen creating a central pore approximately 15 A across (Fig. 2b, c), 
consistent with the small-molecule permeability (less than ~600 Da) 
of secretins from the T2SS, T3SS and type IV pilus system”°5”°. The 
pore is lined with the conserved residues Glu396 and Arg397, creating 
a ring of opposing charge on the extracellular and periplasmic face, 
respectively. Mutation of these, or of surrounding hydrophobic residues 
involved in hairpin packing, affects secretion considerably (Extended 
Data Fig. 10b, c). Furthermore, phage secretin pIV point mutants that 
lead to increased permeability can be mapped to the inner (3 sheet and 
N3 domain” (Extended Data Fig. 10b), supporting the involvement 
of both in gate-opening. The T3SS-specific N3 hairpin insertion packs 
against the underside of the inner 8-sheet and mutation of conserved 
Arg198 and Asp199 at the hairpin tip reduces or eliminates secretion 
(Extended Data Fig. 10b, c). These residues appear to support the closed 
gate, with Arg198 forming a salt bridge with residues at the kink of the 
gate hairpin and Asp199 projecting up towards His403 on the under- 
side of the gate. Taken together, we propose that the N3 domain has a 
role in gating by stabilizing the periplasmic gate in a closed position. 

Comparison of Salmonella SPI-1 cryo-EM reconstructions®**° shows 
that the assembly of the injectisome needle filament involves an out- 
ward reorientation of the region that we can now attribute to the N3 
domain, which, in our closed basal body, is tilted into the secretin 
lumen (Extended Data Fig. 10a). Substrate-induced gate opening that 
occurs through interaction with the N3 domain has been proposed 
for the T2SS, on the basis of low-resolution electron microscopy*". 
Analogously to the type IV pilus system, in which the pilus (with 
similar complementary span and characteristics to the SPI-1 needle 
structure*’) passes through the secretin PilQ*’, rod/needle assembly 
within the injectisome secretin lumen would result in contact with the 
tightly packed N3 ring (Extended Data Fig. 10a, d-f). We envisage that 
this substrate-induced pushing of the N3 domain and the concomitant 
disruption of the N3 domain interface with the secretin domain inner 
B-sheet could provide the trigger for reorientation of the periplasmic 
gate strands and assembly of the conduit in these various secretion 
systems. 
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METHODS 


Basal body purification. For basal body purification, a non-flagellated AprgH 
Salmonella strain** was used, complemented with a thrombin-cleavable N-terminal 
6x His-GFP-tagged PrgH130-392 plasmid based on the prgH deletion plasmids 
described previously**. The PrgH130-392-complemented Salmonella strain was 
transformed with a plasmid expressing the T3SS transcription activator HilA ona 
pBAD plasmid. Bacteria were grown with mild aeration in LB broth supplemented 
with 0.3 M NaCl to an Agoo of approximately 0.5, arabinose was added to a final 
concentration of 0.015% and grown overnight at 30 °C. Cells were resuspended in 
0.1M Tris (pH 8.0), 0.75 M sucrose in the presence of lysozyme (0.4mg ml‘) and 
EDTA (1 mM), and incubated with stirring. Cells were lysed with 1% lauryldimeth- 
ylamine oxide (LDAO) in the presence of protease inhibitor tablets (Roche) before 
adding 10mM MgCl, and 500 mM NaCl to the lysate. Cell debris was removed by 
low-speed centrifugation and basal bodies were pelleted by high-speed centrifuga- 
tion (Beckman, 45Ti rotor, 40,000 r.p.m., 45 min, 4°C). The pellet was resuspended 
in 0.5% LDAO in 10 mM Tris-HCl pH 8, 0.5 M NaCl, 5mM EDTA and adjusted 
to a final concentration of 30% (w/v) of CsCl. Following this, 5-ml samples were 
centrifuged for 16-20h at 25,000 r.p.m. in a Beckman SW41 rotor. Fractions con- 
taining basal body were combined and desalted into 10 mM Tris (pH 8.0), 0.5M 
NaCl, 0.2% LDAO before purification using nickel-charged Chelating Sepharose 
Fast Flow (GE Healthcare). Purified protein was concentrated and incubated with 
Thrombin overnight to cleave the N-terminal His—GFP tag. The sample was fur- 
ther concentrated before gel filtration on a Superose 6 column (GE Healthcare) 
equilibrated in 10 mM Tris (pH 8.0), 0.5 M NaCl, 0.2% LDAO. 

To isolate the secretin, purified protein eluted from the nickel-charged Chelating 
Sepharose Fast Flow resin was incubated with 100 mM disodium hydrogen phos- 
phate (pH 10.5) for 1h, buffer was exchanged and passed back over the Chelating 
Sepharose Fast Flow resin before the final gel-filtration step. 

Cryo-electron microscopy. Aliquots of 2.5 11 of the purified basal body com- 
plex and the isolated secretin were applied to glow-discharged (30s on each side) 
Quantifoil grids (Gold, 400 mesh, R1.2/1.3). The grids were blotted for 3s at 80% 
humidity and plunge-frozen into liquid ethane using a Vitrobot Mark IV (FEI 
Company). Grids were imaged on a 300 keV Titan Krios cryo-electron micro- 
scope (FEI Company) equipped with a spherical aberration corrector, an energy 
filter (Gatan GIF Quantum) and a post-GIF Gatan K2 Summit direct electron 
detector. Images were taken on the K2 camera in dose-fractionation mode at a 
calibrated magnification of 29,240, corresponding to 1.71 A per physical pixel 
(0.855 A per super-resolution pixel). The dose rate on the specimen was set to 
be 3.4 electrons per A? per s and total exposure time was 185, resulting in a total 
dose of 62 electrons per A. With dose fractionation set at 0.375 s per frame, each 
movie series contained 48 frames and each frame received a dose of 1.3 electrons 
per A?. Anenergy slit with a width of 20 eV was used during data collection. Fully 
automated data collection was carried out using SerialEM with a nominal defocus 
range set from —1.5 to —3 1m (ref. 35). 

Image processing. For the basal body complex dataset, a total of 2,515 movie series 
were collected at super resolution (0.855 A per pixel). Motion correction was done 
using Unblur*®, with data binned by two (1.71 A per pixel after binning), and all 48 
frames were aligned and summed to a single micrograph with dose-filtering using 
Sum_movie for further processing. The contrast transfer functions (CTFs) of the 
summed micrographs were determined using CTFFIND4 (ref. 37). Approximately 
2,000 particles were manually boxed out from selected micrographs to generate 
an initial model using reference-free 2D-class averages by EMAN2 (ref. 38). The 
representative 2D-class averages were also used as templates for automated particle 
picking for the entire dataset in Relion®’. With ~263,900 particles automatically 
picked, reference-free 2D classification was performed in Relion and ~159,700 
particles were kept in the good class averages. 3D classification in Relion generated 
one good 3D map out of four with ~67,800 particles. 3D refinement was performed 
with these 67,800 particles using Frealign*°, with and without imposing 24-fold 
symmetry. The refinement resolution upper limit was set to 8 A in Frealign so 
that any information beyond 8 A was not used in the refinement. Fourier shell 
correlations (FSCs) calculated from unmasked maps in Frealign at 0.143 criterion 
reported 6.3 A resolution for the C1 map and 4.3 A resolution for the C24 map 
using frequency limited refinement procedure. Final B-factor sharpening of the 
maps was performed using Bfactor.exe (Grigorieff laboratory). Local resolution 
estimations were performed using ResMap" with the unfiltered and unmasked 
C1 map and C24 map, showing that certain regions exhibit better resolvability 
than the overall resolution. 

For the InvG secretin pore dataset, a total of 2,685 movie series were collected at 
super resolution (0.855 A per pixel). Motion correction, dose filtering and CTF cor- 
rection were done in the same fashion as for the basal body dataset. Approximately 
164,300 particles were initially picked using representative 2D class averages as 
templates in Relion®’. Relion was used for 2D classification and ~132,800 particles 


were kept in good class averages. The initial 3D map of the secretin pore was cut 
from the corresponding part of the basal body initial map and low-pass filtered to 
50A resolution. Relion 3D classification generated one good 3D map out of four, 
with ~83,900 particles. Relion was used to perform 3D auto-refinement, both 
with and without imposing C15 symmetry. Semi-automated post-processing of 
the maps, including automated soft masking, modulation transfer function and 
B-factor sharpening, was performed in Relion and yielded the final maps. FSCs at a 
criteria of 0.143 reported a 4.4A resolution for the C1 map and 3.6A resolution for 
the C15 map, using gold-standard refinement procedures. These FSC curves have 
been corrected for the soft-mask effects with high-resolution noise substitution”. 
Model building, refinement and modelling. For the inner-membrane rings, the 
X-ray crystal structures of PrgH (PDB accession number 4G11), PrgKj9_g2 (PDB 
accession number 4W4M) and PrgKo¢_133 (PDB accession number 4OYC) were 
docked into the 24-fold averaged basal-body reconstruction with Chimera*’ and 
refined using density-guided iterative local refinement as implemented in Rosetta“, 
taking into account the ring symmetry at all steps. Gaps in the model were manu- 
ally built using Coot where permitted by the density or by RosettaCM*, guided 
by the experimental data, in less well-resolved regions. Rosetta iterative local refine- 
ment and Phenix” real-space refinement was carried out on the complete models 
(PrgHj71-364 and PrgK9-293) and Molprobity** and EMRinger® were used for vali- 
dation, with the final models having good stereochemistry (Molprobity score 1.15, 
Ramachandran-favoured 94.39%, outliers 1.07%) and an EMringer score of 2.1, 
typical of refined electron microscopy models in the 3-3.5 A resolution range. For 
InvG, the density was of sufficient quality to permit manual tracing of most of the 
Ca-backbone in Coot and subsequent sequence assignment. Model refinement and 
building of regions with less well-resolved density proceeded as above, taking into 
account the 15-fold symmetry, resulting in a final model covering residues 172-557 
(residues 1-171, 217-266 and 558-562 disordered) with a Molprobity score of 
1.34 (Ramachandran-favoured 94.28%, outliers 2.41%) and EMringer score of 3.1. 
Rosetta electron-microscopy-guided symmetrical docking for the unresolved 
InvG NO and N1 domains was carried as previously described? using the InvG34_173 
X-ray crystal structure (PDB accession number 4G08; truncated C terminus to 
residue 171) and the unaveraged basal body reconstruction as a restraint. 
Evolutionary conservation analysis was carried out with ConSurf° using both 
T3SS and all secretin homologues for InvG. Surface electrostatics analysis was 
carried out using APBS*!. 
LC-MS/MS. In-gel digestion for the basal body sample was carried out as 
described™. Digested proteins were loaded onto a Bruker Impact II Q-ToF mass 
spectrometer. Peptide separation was carried out on a 50-cm in-house-packed 75-|um 
C18 column by a Proxeon EasynLC UPLC system, using 120-min water:acetonitrile 
gradients. Eluted peptides were ionized in positive ion mode, collecting MS/MS 
spectra for the top 15 peaks >500 counts, with a 30-s dynamic exclusion list. 
Resulting data files were loaded into MaxQuant v1.5.1.0 for analysis>*. 
Identification used a 0.006 Da MS tolerance and 40 p.p.m. MS/MS tolerance, and 
a Salmonella-specific protein database containing all annotated Salmonella proteins 
extracted from Uniprot. 
Generation of an in-frame, non-polar invG deletion mutant in Salmonella. 
The sacB gene-based allelic exchange method and the suicide vector pRE112 
(ref. 54) were used to generate an in-frame deletion mutant of invG in S. enterica 
serovar Typhimurium strain $L1344, which is resistant to streptomycin. Two 
DNA fragments flanking upstream (0.9 kb) and downstream (1.1 kb) of the 
coding region of invG were generated by PCR using primer pairs InvG-1 (KpnI 
underlined) (5’-GCGGTACCAGTTTCATAATGATTGCATCAGG-3’) and 
DinvG-R (Nhel underlined) (5’-GCGCTAGCACTAGAATAACCAGGTGTA 
ACCAG-3’), or DinvG-F (Nhel underlined) (5’-GCGCTAGCAATATTCTGAA 
GCAAAGCGGTGCC-3’) and InvG-2 (SacI underlined) (5’-GCGAGCTCAGTGAA 
GAGGGTATGGCTTTAC-3’). The two PCR products were first cloned into pCR2.1- 
TOPO (Invitrogen) and verified by DNA sequencing. After digestion with KpnI/ 
Nhel or Nhel/Sacl, respectively, the two DNA fragments were gel-purified and 
cloned into KpnI/SaclI-digested pRE112 in a three-way ligation, generating 
pRE112-AinvG, which contains 0.9 kb upstream and 1.1 kb downstream flanking 
sequences of invG and the invG gene with an internal in-frame deletion from 
nucleotides 76 to 1,596 (about 90% of the coding region). The coding regions of 
the 25 N-terminal and the 31 C-terminal amino acid residues of invG are retained 
to avoid any polar effects on the expression of the downstream invE gene that has 
its start codon overlapping with the stop codon of invG. An Nhel site was intro- 
duced into the deletion site. The suicide vector pRE112-AinvG was transformed 
into E. coli strain SM10 pir by electroporation, and introduced into Salmonella 
strain SL1344 by conjugation. After sucrose selection, Salmonella colonies resist- 
ant to sucrose and streptomycin and sensitive to chloramphenicol were screened 
for invG deletion by PCR. The invG mutants were further verified by multiple 
rounds of PCR. The invG deletion mutant thus created showed abolished 
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SPI-1-T3SS-mediated protein secretion, and this secretion defect can be com- 
plemented by expressing InvG from a plasmid (pMSD238; ref 55). For comple- 
mentation assays, site-directed mutagenesis was carried out using a QuikChange 
mutagenesis kit (Stratagene). 

SPI-1 secretion assays. Salmonella strains were grown overnight in LB broth 
(supplemented with antibiotics if required) at 37°C with shaking at 225 r.p.m. The 
cultures were then diluted 1:100 into 4 ml of fresh LB, with appropriate antibiotics, 
and grown under the same conditions for 6h to induce SPI-1 type III secretion. The 
cultures were then centrifuged at 16,100g for 10 min to pellet the bacteria and the 
bacterial pellet was re-suspended in SDS-PAGE sample buffer to generate whole 
cell lysates. The growth medium supernatant was collected and passed through a 
Millex-GV 0.22 1m filter unit (Millipore) to remove any remaining bacteria and 
the secreted proteins were precipitated with a final concentration of 10% (v/v) 
trichloroacetic acid. The secreted proteins were then collected by centrifugation at 
16,100g for 30 min, the protein pellet was dried in air and dissolved in SDS-PAGE 
sample buffer, with residual trichloroacetic acid neutralized using 0.5 1l of saturated 
Tris. The amount of the sample buffer used to re-suspend the bacterial pellet or 
dissolve the precipitated proteins was normalized according to the A¢oo values of the 
cultures to ensure equal loading of the samples. The secreted proteins were run on 
12% SDS-PAGE gels and stained with Coomassie blue G250 (see Supplementary 
Information panels a-d for original SDS-PAGE gels and anti-InvG western 
blots). We note that some mutant proteins are present at sub-wild-type levels, 
especially those of the AHL implicated in membrane association. We propose that 
this is a result of decreased stability owing to impaired membrane association and 
insertion, and oligomerization. 

InvG localization assay. A 50-ml secretion assay culture was collected, washed 
in PBS, and resuspended in 1 ml of 20 mM Tris-HCl, pH 7.5, 150mM NaCl, 
3mM MgCh, 1mM CaCh, with protease inhibitors (Roche). The sample was 
lysed by sonication and centrifuged at 5,000g to remove unlysed cells and debris. 
Supernatants were centrifuged in a Beckman Optima TLX Ultracentrifuge for 
60 min at 100,000g to pellet the membranes. The supernatant corresponding to the 
soluble protein was collected and the membrane pellet washed once in lysis buffer. 
The final pellet, corresponding to the membrane fraction, was resuspended in 1 ml 
lysis buffer supplemented with 1% SDS. See Supplementary Information panel e 
for original anti-InvG western blot. 

Expression and purification of InvH27_147 and InvGs20-562. To verify the 
interaction between the pilotin protein InvH and the S domain of the secretin 
InvG, the proteins were first isolated for use in isothermal titration calorimetry 
(ITC). To avoid complications from stabilizing the full-length lipidated InvH, an 
N-terminally truncated construct InvH27_147 was purified. His-tagged InvH 7-147 
and Hisj9-tagged InvGs20-562 (wild type and the Trp448Ala/Lys449Al/Val552Ala 
mutant) were expressed with pET28a in E. coli BL21 (DE3) at 22°C. Cells were 
collected by centrifugation and resuspended in 4 ml lysis buffer (20 mM HEPES 
(pH 7.5), 500mM NaCl, 15 mM imidazole, one protease inhibitor mixture tablet 
(Roche)) per gram of cell paste. The cells were lysed with an Avestin cell homoge- 
nizer and centrifuged at 40,000g for 30 min to pellet insoluble material. The super- 
natant was loaded onto a 1-ml HisTrap HP Ni Sepharose column (GE Life Sciences) 
and washed with 40 ml of wash buffer (20 mM HEPES (pH 7.5), 500 mM NaCl, 
50 mM imidazole), followed by a 10-ml wash with 75 mM imidazole. InvHo7_147 
was eluted in 10-20 ml of 20mM HEPES (pH 7.5), 500 mM NaCl, and 250 mM imi- 
dazole. Both the wild-type and mutant constructs of InvGs20-562 were instead eluted 
at 500 mM imidazole. The eluted fractions were desalted into 20 mM HEPES pH 
7.5, 500mM NaCl, and the His-tag cleaved overnight at 4°C with 1:1,000 Bovine 
a-thrombin (HTI). The thrombin-treated sample was incubated with 500 il of 
Ni-charged chelating sepharose FF (GE Life Sciences) at 4°C for 30 min to remove 
uncleaved protein. The sample was concentrated and loaded onto a Superdex 75 
10/300 GL column (GE Life Sciences) equilibrated with 20 mM HEPES (pH 7.5), 
500mM NaCl. The protein eluted as a single peak, and protein containing fractions 
were pooled and concentrated to 20mg ml”! (InvH7_147) and 5mg ml! (wild-type 
and mutant InvGs9_562). 

Isothermal titration calorimetry. ITC was performed using a MicroCal iTC200 
(Malvern). All protein samples were dialysed overnight against 20 mM HEPES pH 
7.5, 500mM NaCl. The wild-type InvGs20-562 (301M) contained in the ITC cell 
(20011 volume) was titrated with twenty 2-1] injections of InvH27_147 (300 1M). 
The heat of dilution of the titrant was corrected for by subtracting a buffer control. 
Titrations of the Lys448Ala/Trp449Ala/Val552Ala triple mutant of InvGs29_562 was 
performed in the same way, but at concentrations of 100|1M InvGs90-5¢2 in the cell 
and 1,000 |1M InvHo7-147 as the titrant to compensate for weaker binding. Four 
experiments were performed at 25°C. Data points were fitted to a one-binding-site 
model using the Origin 7 software (OriginLab Corporation). The algorithm was 
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used to optimize values for the binding stoichiometry (N), the association constant 
(K), and the binding enthalpy (AH), on the basis of the fit of the data. 

Circular dichroism. Circular dichroism spectra of wild-type InvH»7_147 and 
InvGs70_s62 were measured using a Jasco J-810 Circular Dichroism Spectrometer 
(Jasco Incorporated). Spectra of the individual proteins were measured at a con- 
centration of 11,1M, whereas InvH»7_\47 and InvGs29-s62 were combined with a 
final concentration of 111M each to record the spectrum of the complex. Spectra 
were recorded in 25mM Tris pH 7.5, 150mM NaF at 25°C with a path length of 
0.1cm. Each spectrum represents the average of four scans, collected from 280 nm 
to 190 nm, with a spectral bandwidth of 1 nm and a response time of 2s. 

Data availability. Cryo-EM maps and atomic coordinates have been deposited 
with the Electron Microscopy Data Bank (with accession codes EMD-8398, 
EMD-8399, EMD-8400, EMD-8401) and Protein Data Dank (accession codes 
5TCP, 5TCQ and 5TCR). The mass spectrometry data have been deposited with 
the ProteomeXchange Consortium via the PRIDE” partner repository with the 
dataset identifier PXD005133. 
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Extended Data Figure 1 | 3D reconstructions of the basal body complex. map of the basal body complex. g, h, FSC of the C1 reconstruction (g) and 


a, Representative micrograph of the basal body complex (a total of 2,515 the C24 reconstruction (h) calculated in Frealign using unmasked maps. 
were recorded). b, Selected reference-free 2D class averages. c, d, The i, Local resolution estimations of the C1 map. j, The C24 map from 

side view (c) and the bottom view (d) of the Cl (no symmetry imposed) ResMap. Arrows in c and e indicate location of slice. k, Representative 
reconstructed map of the basal body complex. e, f, The side view (e) and density for the inner-membrane rings (4.3 A resolution). 


the bottom view (f) of the C24 (24-fold symmetry imposed) reconstructed 
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Protein ID MW (kDa) Peptides Relative 
intensity 
INVG 61.7 51 1 
PRGK 30 14 0.40 
PRGH 28.2" 26 0.32 
SPAP 25.2 13 0.06 
INVA 76.1 12 0.004 sa Pretaap 
SPAQ 9.4 3 0.002 
SPAO 33.8 1 0.0016 
SIPA 73.9 14 0.0013 
INVC 47.6 6 0.0005 
SICA 19.2 3 0.00035 
INVH 16.5 1 0.00016 


Cc. 
752 hairpin 


Extended Data Figure 2 | PrgH130-392 basal body composition and 
oligomerization of PrgH and PrgK. a, Proteins from the purified 
PrgH)30-392 basal body identified by LC-MS/MS. InvG, PrgH, PrgK and 
SpaP were detected with elevated intensity. InvA and SpaQ peptides 
were also detected with decreased abundance. Some cytoplasmic 

export apparatus proteins and effectors were detected at very low levels, 
indicating they can still interact with the basal body complex. Only T3SS 
proteins are shown. *MW of PrgH)30-392. b, Schematic of PrgH and Prgk 
domain topology and position, as observed in our basal body structure. 
The PrgH cytoplasmic D1 domain is absent in the PrgH)39_392 mutant 


and its location with respect to the basal body is unclear. ¢, Structure of 
PrgHj71-364 (green) and PrgK»o_203 (orange). Previously unresolved Prgk 
loops now observed in the assembled state are labelled and shown as 
sticks. d, Role of the Prgk D1-D2 linker in PrgK oligomerization. Residues 
84-90 form a helix, with Phe89 (shown as spheres) inserting between 
neighbouring D2 domains. e, The D2 transmembrane loop contributes 

to oligomerization interacting with both the PrgK D1 domain and 
neighbouring PrgH protomers (green/yellow). Leu195 is the equivalent 
termination position in EPEC EscJ and the local environment of Lys203 
was previously supported by chemical cross-linking?. 
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Fourier Shell Correlation for the C1 Reconstruction of the Secretin Pore 
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Extended Data Figure 3 | 3D reconstruction of the isolated secretin. f, g, Bottom view (f) and, side view (g) of the C15 (15-fold symmetry 

a, Representative micrograph of the isolated secretin (total 2,685 imposed) reconstructed map of the isolated secretin. h, FSC curve of the 
recorded). b, Selected reference-free 2D class averages. c, d, The bottom C15 reconstruction using gold-standard refinement with soft-masking- 
view (c) and side view (d) of the C1 (no symmetry imposed) reconstructed _ effect correction. i, Representative density for the isolated secretin 

map of the isolated secretin. e, FSC curve of the C1 reconstruction (3.6A resolution). 


using gold-standard refinement with soft-masking-effect correction. 
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Extended Data Figure 4 | Secondary structure topology of the InvG secretin. Secondary structure topology for InvGy72-557. 6-strands of the secretin 
domain are numbered, with 1, 3a/3b, 8 and 9 forming the outer (3-barrel; 4-7 forming the inner (-barrel; and 1, 2 and 3a forming the lip of the }-barrel. 
Strand 3 is broken into 3a and 3b by the conserved residue Pro371. 
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Extended Data Figure 5 | Sequence conservation of secretins 

and structure-based mapping of previously characterized PulD 
oligomerization mutants. a, InvG)72-557 sequence conservation (see 
Methods), coloured from magenta (highest) to cyan (lowest) for T3SS 
homologues. b, c, Close-up views of boxed regions in a. Pro371 and Gly471 
(see f), highly conserved in all secretins (see Extended Data Fig. 6a), 

are labelled in c. d, PulD multimerization mutants mapped onto the InvG 
structure (shown as spheres). Coloured according to Fig. 2. Residues are 
labelled with letters and correspondingly annotated in e (N3 domain 
alignment; a-b) or Extended Data Fig. 6a (secretin domain alignment; 
c-t). e, Alignment of N3 domain (InvGjg0-300) from T3SS (green) and 


non-T3SS (orange) secretin homologues (see Fig. 3d). Conserved 

regions are lettered red and boxed. Invariant residues are boxed in solid 
red. Secondary structural elements observed here are annotated and 
numbered, consistent with other RBMs. PulD multimerization and pIV 
permeability mutants mapping to this domain are indicated by letters 

or numbers, respectively, and shown in d (for PulD) or Extended Data 

Fig. 10b (for pIV). f, InvG complementation assay for N3 domain ring- 
interface mutants (Leu293Arg, Leu293Ala; labelled in d) and conserved 
secretin domain 8 sandwich mutants (Pro371Leu, Gly471Ala; also labelled 
in cand Extended Data Fig. 6a). Lysate InvG protein levels indicated below 
using anti-InvG antibody. 
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Extended Data Figure 6 | Sequence alignment for T3SS and non-T3SS (Extended Data Fig. 5f) are marked with asterisks. PulD multimerization 
secretin domains. a, Alignment of secretin domain (InvG302-519) from and pIV permeability mutants mapping to this domain are indicated 
T3SS (green) and non-T3SS (orange) secretin homologues (see Fig. 3d). by letters or numbers respectively and shown in Extended Data Fig. 5d 
Anti-parallel strands of the outer and inner 3-sheets coloured cyan and (PulD) or Extended Data Fig. 10b (pIV). b, Corresponding structural 
green respectively with loop regions (where most indels are located) elements. The residues that make up the putative membrane interacting 
denoted by pattern of diagonal lines. Strands numbered according region (gold) are boxed in both the structure and the sequence alignment. 


to Extended Data Fig. 4. Conserved residues Pro371 and Gly471 
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Extended Data Figure 7 | Secretin oligomerization and stoichiometry. 
a, Secretin domain interface between neighbouring monomers (denoted 
as iand i+ 1). To differentiate adjacent neighbours and their extensive 
interfaces, the i ribbon is coloured as in Fig. 2 with the residues forming 
the oligomeric interfaces shown as sticks. The i+ 1 ribbon is coloured grey 
with the residues at the oligomeric interface shown as transparent Van der 
Waal spheres and coloured according to domain feature. b, c, Role of N3 
(cobalt blue) and S (red) domains in secretin multimerization. The 

i ribbon is coloured as in Fig. 2 with N3 and S domain interface residues 
shown as sticks, neighbouring monomers (i+ 1 and i+ 2) coloured grey 
and white, respectively (and also delineated by dashed line), with the 
residues interacting with N3 or S domains from monomer i displayed as 
spheres and coloured as in Fig. 2. The N3 domain of monomer i 


Secretin EMD Res.A on MW 
(kDa) 
InvG Here 3.6 15 24.0 
InvG 1875 10 15 24.0 
MxiD 16175 25 12 24.6 
YscC 5720+ 15 12 25.2 
57215” 
PscC 262958 14.4 12 25.3 
GspD 17637! 19 12 28.6 
PulD 262858 8.2 12 26.2 


(blue sticks) interfaces with both the N3 domain (blue spheres) and 
underside of the inner ( sheet (green spheres) of monomer i+ 1. The S 
domain of monomer i (red sticks) forms an extensive stapled interface 
with the outer 3-sheet of i+ 1 and i+ 2 monomers (cyan spheres) and 
with the N-terminal S domain helix of monomer i+ 1 (red spheres). 

d, e, Superimposition of available secretin electron microscopy maps and 
table showing EMD accession number, resolution, stoichiometry (n) and 
molecular weight (MW) of the secretin domain monomer. All secretins 
have the same general architecture, with GspD having an elongated 
upper lip and enclosed upper chamber as a result of the loop 1 insertion 
following strand 2 (Extended Data Fig. 6a). MxiD (S. flexneri T3SS), YscC 
(Y. enterocolitica T3SS), PscC (PB aeruginosa T3SS), GspD (V. cholerae 
T2SS), PulD (K. oxytoca T2SS). 
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Extended Data Figure 8 | Interaction of the secretin S domain with the 
pilotin. a, InvG S domain (red) superposed with the MxiM/MxiD pilotin- 
bound secretin peptide (yellow) from S. flexneri. The electronegative 
MxiD Glu555 and Asp556 superpose with InvG Asp543 and Asp544. InvG 
Asp544 forms a salt bridge with Lys315, which is also conserved between 
MxiD and InvG (shown as sticks). Ordering of this helical peptide of the 
secretin S domain has been shown to occur upon binding to the pilotin. 
b, Superimposition of pilotin MxiM (green surface) onto InvG, based 

on binding to a common secretin peptide as in a, showing the S domain 
orientation is permissive of an assembled interaction with pilotin in the 
oligomerized form. Putative linker (solid line) and membrane inserted 
lipidation shown. OM, outer membrane. c, Far-UV circular dichroism 
spectra (from an average of four scans) for InvGs9_562 (blue line), InvHy7_-147 
(green line) and the complex (red line) showing an approximately 


20% increase in ellipticity at 222 nM as compared to the calculated 
combined spectra (red dashed line), indicative of increased helical 
content in the complex and consistent with a disorder-order transition 

in the InvG S domain. d, e, ITC analysis for the interaction between 
InvH)7_147 and InvGs520-562 and InvGs20-562(K548A/W549A/V552A) 
respectively. A representative run is shown (from four runs) and Kg and N 
(stoichiometry) are reported as mean (standard deviation) from four runs. 
f, Generalized schematic of pilotin domains. N-terminal type II signal 
sequence followed by a lipobox lipidation signal with conserved cysteine 
connected to a structurally diverse globular C-terminal secretin binding 
domain via a variable length linker. g, InvG complementation assay for S 
domain deletion mutants and C-terminal helix triple mutant. Lysate InvG 
protein levels indicated below using anti-InvG antibody. 
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Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | InvG membrane localization and proposed 
pathway for pore assembly and membrane insertion. a, Distribution 
of wild-type (WT) InvG and mutants in whole-cell lysate, soluble and 
membrane fractions. Ability to form SDS-resistant oligomer assessed 
by running both boiled (+) and unboiled (—) samples. The AHL 
mutant InvG(F486A/L487A/L490A/L492A/1493A/L496A/F497A) and 
transmembrane (-strand mutant InvG(V337G/L339G) (mutants that 
abrogate and reduce secretion, respectively; see Fig. 3e) both localize 

to the membrane (the AHL mutant to a lesser degree); however, their 
ability to form SDS-resistant oligomers is substantially affected and 
protease sensitivity is evident for the AHL mutant—both observations 
are consistent with aberrant membrane association, insertion and 

final stabilized assembly~*. b-d, Schematic of the proposed secretin 
assembly and membrane insertion pathway. The secretin monomer 

is localized to the outer membrane in a Lol-dependent manner, most 
likely in a complex with the pilotin, where we propose initial membrane 
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association is mediated by the conserved amphipathic loop (AHL; 

gold). Oligomerization to a membrane associated pre-pore intermediate 
follows which, based on our structure and earlier negative stain electron 
microscopy of this form in PulD (c; figure reproduced from ref. 23), would 
encompass a folded secretin core and the peripheral N3 and S domains 
with the extensive interfaces between them (see Extended Data Fig. 7a-c) 
stabilizing this pre-pore form. Subsequent folding of the remainder of 

the secretin B-domain (disordered in the pre-pore image) to create the 
upper 6-barrel lip (gold) leads to the final, BAM-independent, membrane 
insertion and creation of the membrane-spanning secretin pore. 
Outer-membrane curvature observed in situ in several T3SSs is illustrated 
by overlaying the secretin structure with the in situ tomography of the 
Shigella injectisome (d; figure reproduced from ref. 20) also showing 
presence of continuous density, proposed to be the pilotin (circled), 
connecting the inner leaflet of the outer membrane and the region 
corresponding to the S domain in our InvG structure. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | The secretin periplasmic gate. 

a, Superimposition of the structure of InvG (coloured according to Fig. 2), 
basal body map (cyan), the approximately 10 A needle-complex map 
(EMD-2481; grey) and the approximately 20 A needle-complex map 
(EMD-1100; pink). Our basal body has dimensions with greater similarity 
to EMD-1100. Substrate passage through the secretin would require 

N3 domain reorientation and subsequent opening of periplasmic gate 
indicated with black arrows. Differences in the regions corresponding 

to the upper secretin 3-sandwich in the needle-complex maps (circled) 
suggest a conformation of the open gate packed against the outer }-sheet. 
b, InvG structure, showing the periplasmic gate and N3 domain residues 
implicated in gating. InvG mutants with a secretion-deficient phenotype 
(shown in c) are shown as balls and sticks and the location of the 
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N3-secretin interface-localized phage secretin pIV permeability mutants 
(additional pIV permeability mutants are mapped throughout the inner 
8-sheet) are shown as spheres (labelled 1-4 and correspondingly annotated 
in Extended Data Fig. 5e (N3 domain alignment; 1-2) or Extended Data 
Fig. 6a (secretin domain alignment; 3—4)). c, InvG complementation assay 
for periplasmic-gate and N3-hairpin mutants. Lysate InvG protein levels 
are indicated below using anti-InvG antibodies. d, Electrostatic surface 

of the InvG pore structure, including the Rosetta-modelled NO and N1 
domains, as viewed from the periplasmic face. e, Electrostatic surface (left) 
of the Salmonella SPI-1 needle (PDB 2LPZ) and corresponding ribbon 
representation (right). A single PrgI monomer is coloured magenta. 

f, Superimposition of Salmonella SPI-1 needle (purple) onto the InvG,7._557 
pore structure (blue), as viewed from the extracellular face. 
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High-resolution crystal structure of the human CB1 


cannabinoid receptor 


Zhenhua Shao, Jie Yin!, Karen Chapman!, Magdalena Grzemskal, Lindsay Clark!, Junmei Wang? & Daniel M. Rosenbaum! 


The human cannabinoid G-protein-coupled receptors (GPCRs) CB1 
and CB2 mediate the functional responses to the endocannabinoids 
anandamide and 2-arachidonyl glycerol (2-AG) and to the widely 
consumed plant phytocannabinoid A*-tetrahydrocannabinol 
(THC)!. The cannabinoid receptors have been the targets of 
intensive drug discovery efforts, because modulation of these 
receptors has therapeutic potential to control pain’, epilepsy’, 
obesity*, and other disorders. Although much progress in 
understanding the biophysical properties of GPCRs has recently 
been made, investigations of the molecular mechanisms of the 
cannabinoids and their receptors have lacked high-resolution 
structural data. Here we report the use of GPCR engineering 
and lipidic cubic phase crystallization to determine the structure 
of the human CB1 receptor bound to the inhibitor taranabant at 
2.6-A resolution. We found that the extracellular surface of CB1, 
including the highly conserved membrane-proximal N-terminal 
region, is distinct from those of other lipid-activated GPCRs, 
forming a critical part of the ligand-binding pocket. Docking studies 
further demonstrate how this same pocket may accommodate the 
cannabinoid agonist THC. Our CB1 structure provides an atomic 
framework for studying cannabinoid receptor function and will 
aid the design and optimization of therapeutic modulators of the 
endocannabinoid system. 

The endocannabinoid signalling system in mammals comprises 
endogenous lipid messengers (anandamide and 2-AG) and two 
homologous GPCRs (CB1, which is located in the nervous system 
and periphery, and CB2, which is expressed primarily in immune 
cells)!. Human CB1 and CB2 (which share 42% sequence identity) 
are also activated by natural products® such as THC and by synthetic 
cannabinoids and can be inhibited by diverse subtype-selective and 
non-selective antagonists and inverse agonists®. CB1 is the most 
abundant GPCR in the central nervous system (CNS) and regulates 
diverse brain functions and behaviours, modulating neurotransmitter 
release and neuronal excitation through the pre-synaptic activation of 
the G-protein Gi. (inhibiting adenylate cyclase), GIRK channels, and 
arrestin/MAP kinase signalling’. Endocannabinoids are synthesized 
postsynaptically by lipases and travel across synapses in a retrograde 
manner®, embedding in the presynaptic membrane where they can 
activate CB1°. Beyond the CNS, CB1 signalling in peripheral tissues 
has been implicated in other physiological mechanisms such as release 
of the metabolic hormones leptin and insulin!°!!. However, the mech- 
anism by which lipidic or lipophilic cannabinoid agonists access their 
GPCR-binding sites and promote receptor activation through specific 
binding interactions is, as yet, unknown. 

Although humans have been consuming phytocannabinoids for their 
psychotropic effects for thousands of years', THC was only found to 
be the active chemical constituent of Cannabis sativa in 1964 (ref. 12). 
Recently, alternative therapeutic uses for cannabinoid ligands have been 
pursued. As the endocannabinoid system is involved in the regulation of 
energy metabolism’, synthetic inverse agonists such as rimonabant and 


taranabant have proven effective in the clinic for treatment of obesity, 
but have failed to secure regulatory approval owing to adverse CNS 
side effects'*. Peripheral blockade of CB1 by non-penetrant inverse 
agonists may represent an alternative therapeutic strategy for treating 
obesity, while avoiding CB1 receptors in the CNS". Natural and syn- 
thetic cannabinoid ligands have also shown considerable promise in 
the treatment of neuropathic pain? and epilepsy-induced seizures*. To 
gain further insight into the molecular mechanisms of cannabinoid 
system modulators and aid structure-based ligand design, we sought 
to crystallize the human CB1 receptor and solve its atomic structure. 
Obtaining diffraction-quality crystals of CB1 required optimization 
of both the construct and the purification technique. We carried out 
differential scanning fluorimetry on the detergent-solubilized receptor, 
which identified the inverse agonist taranabant as a ligand conferring 
enhanced thermostability (Methods and Extended Data Fig. 1). To pro- 
mote lipidic cubic phase (LCP) crystallization, we replaced the third 
intracellular loop (ICL3) of CB1 with the thermostable PGS (Pyrococcus 
abyssi glycogen synthase) domain, which recently proved essential in 
helping solve crystal structures of the human orexin receptors’. We 
also incorporated the point mutation T210A, which was previously 
shown to stabilize the inactive conformation of CB1 and increase 
thermostability’’. Finally, we truncated CB1(T210A)-PGS by eliminat- 
ing the first 89 N-terminal residues and the C terminus after residue 421. 
The engineered construct binds to the inverse agonists taranabant and 


Figure 1 | Global structure of CB1 bound to taranabant. a, CB1 is 
represented as a teal cartoon. The taranabant ligand is shown as spheres 
with magenta carbon atoms. Views are from within the plane of the 
membrane (top) and from the extracellular space (bottom). b, Solvent- 
accessible surface representation of CB1 from the same views as in 

a. c, CB1 surface representation coloured according to electrostatic 
potential, from red (negative) to blue (positive). 
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Rhodopsin 


Retinal 


Figure 2 | Membrane-proximal N-terminal region of CB1. a, Interaction 
between the membrane-proximal N-terminal region and the rest of the 
receptor. Residues 100-112 are shown as orange sticks, and taranabant 

is shown as magenta spheres. The remainder of CB1 is depicted as a teal 
solvent-accessible surface. b, The extracellular region of CB1, with the 
receptor as a teal cartoon and taranabant as magenta sticks. c, The S1P, 
receptor (PDB accession number 3V2Y) is depicted as a blue cartoon, from 
the same perspective as in a after superposition with CB1. The ML056 
antagonist is shown as blue sticks. d, The LPA1 receptor (PDB accession 
number 4Z35) as a salmon cartoon, from the same perspective as in a after 
superposition with CB1. The ONO 9910539 antagonist is shown as salmon 
sticks. e, The GPCR rhodopsin (PDB accession number 1F88) as a grey 
cartoon, from the same perspective as in a after superposition with CB1. 
The 11-cis-retinal inverse agonist ligand is shown as cyan sticks. Glycosyl 
moieties in the N-terminal region are removed for clarity. 


rimonabant (also denoted SR141716A) in a manner nearly identical 
to wild-type CB1. CB1(T210A)-PGS has, however, a sevenfold lower 
affinity for the agonist CP55940, consistent with stabilization of an 
inactive conformation (Extended Data Fig. 2) and in agreement with 
the original report of the T210A mutation’>. After purifying this con- 
struct from Sf9 insect cells (Methods and Extended Data Fig. 3), we 
obtained LCP microcrystals that diffracted to 2.6 A resolution, solved 
the structure by molecular replacement and refined the structure to an 
Rgee value of 0.23 (Methods and Extended Data Table 1). In the monoclinic 
crystals, CB1(T210A)-PGS packs in a manner such that the extracellular- 
facing ligand-binding region is not involved in lattice contacts 
(Extended Data Fig. 4a), and the receptor and ligand are well ordered 
with low overall B factors. Although truncation of the N terminus of 
CB1 was necessary to form diffraction-quality crystals, such modifica- 
tions may affect the functional properties of the receptor, as indicated 
by the variable expression and pharmacology of tissue-specific splice 
variants in this region'®. Nevertheless, our binding data (Extended Data 
Fig. 2) and previous precedent!” show that the basic inverse-agonist- 
and agonist-binding properties of CB1 are maintained in a receptor 
lacking the N-terminal 89 residues (which also contains three consensus 
N-linked glycosylation motifs). 

The global structure of the CB1 receptor, with its classical seven- 
transmembrane fold, is shown in Fig. 1. Using other rhodopsin family 
(class A) GPCRs as guides!®, the taranabant-bound CB1 structure 
represents an inactive conformation with respect to G-protein binding, 
with a canonical ionic lock formed between Arg214*°° and Asp338°° 
(distance, 3.4 A; Ballesteros—Weinstein numbering used in superscript). 
At the extracellular surface, the second extracellular loop (ECL2) and 
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the membrane-proximal N-terminal region preceding transmem- 
brane domain 1 (TM1) form a lid over the orthosteric pocket, which 
almost completely shields taranabant from the solvent (Fig. 1a, b). As 
is observed in the structure of the lipid-activated GPCR S1P, (ref. 19), 
a gap between TM1 and TM7 in the extracellular leaflet (Fig. 1b) may 
contribute to a membrane-embedded access channel for lipophilic 
agonists. Further dilation of the highly conserved residues (Ile119!°, 
Phe3817°’, and Met3847*°) that line this channel (Extended Data 
Fig. 5) would be required to facilitate entry of ligands. Previous molecular 
dynamics simulations proposed that the endocannabinoid 2-AG enters 
into the homologous CB2 receptor between TM6 and TM7 (ref. 20); 
however, these two transmembrane domains are tightly associated in 
the present structure. Taranabant makes multiple contacts with both 
TM1 and TM7 and fills the orthosteric pocket directly inside the 
TM1-TM7 opening, potentially acting as a plug that blocks entry of the 
endocannabinoid. The extracellular face and lid above the orthosteric 
pocket contain an abundance of acidic residues, giving a negatively 
charged surface that will energetically disfavour interaction with nega- 
tively charged ligands (Fig. 1c). This feature of CB1 may help to ensure 
lipid-binding selectivity in a bilayer containing a high concentration 
of negatively charged phospholipids. 

The first part of the N terminus of CB1 observed in the electron 
density of our crystals begins at E100. The 13 membrane-proximal 
amino acids that precede TM1 fold over the ligand-binding pocket 
and interact with TM2, TM3, ECL2, and TM7 (Fig. 2a, b). This region 
is highly conserved in CB1 (Extended Data Fig. 6) and contributes 
extensively to interaction with taranabant (as will be discussed). The 
occluded nature of the CB1 orthosteric pocket was predicted by a study 
showing that disulfide bond formation between Cys98 and Cys107 
modulates orthosteric ligand binding”’; however, this disulfide bond is 
either not present or not visible in the current structure (possibly owing 
to cysteine capping by iodoacetamide). To assess the flexibility of the 
N-terminal region of CB1, we carried out a 60-ns molecular dynamics 
simulation of the CB1 structure embedded in an explicit POPC 
bilayer in the presence and absence of taranabant. In both cases, the 
N-terminal region was highly stable over the course of the simulation, 
exhibiting low root mean squared deviation (r.m.s.d.) values compa- 
rable to those of the entire transmembrane bundle (Extended Data 
Fig. 7). These results support the idea that the N-terminal region of 
CB1 will maintain a conformation similar to the structure observed 
here, even in the absence of ligand. Other lipid-activated GPCRs that 
have been structurally characterized (S1P, and LPA); refs 19 and 22, 
respectively) contain a disulfide-cross-linked ECL2 structure that is 
very similar to that of CB1; however, the N-terminal regions of these 
receptors are markedly different, containing a-helices that sit above 
the membrane and pack between ECL1 and ECL2 (Fig. 2c, d). The 
occluded orthosteric pocket of CB1, with the N-terminal region folding 
over the buried hydrophobic inverse agonist taranabant, is mirrored in 
the structure of the visual photoreceptor rhodopsin bound to 11-cis- 
retinal”? (Fig. 2e). A gap between TM1 and TM7 was proposed as 
part of a channel for uptake and release of the lipophilic 11-cis-retinal 
ligand, based on the structure of the ligand-free opsin in an active 
conformation™, further paralleling the structure of CB1. The opsin 
residues Leu40!*°, Ile2907?’, and Phe2937*° surrounding this gap 
are analogous to CB1 residues Ile1 19!35, Phe38177”, and Met3847° 
(Extended Data Fig. 5). 

Taranabant is a subtype-selective inverse agonist with an inhibition 
constant (Kj) of 0.13nM for CB1 and a K; of 170nM for CB2 (ref. 25). 
Unambiguous electron density at the orthosteric ligand-binding 
pocket (Extended Data Fig. 4b, c) placed taranabant at an unusual 
site, towards TM1 and TM7, contrasting with the space occupied by 
inhibitors of other class A GPCRs, such as the 3 adrenergic receptor!® 
(Fig. 3a). Taranabant adopts a conformation in which the chlorophenyl 
moiety extends towards TM5, the cyanophenyl buries deeper into the 
seven transmembrane bundle and the trifluoromethylpyridine projects 
into the putative access channel between TM1 and TM7 (Fig. 3b). 
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Figure 3 | Binding of taranabant to the CB1 receptor. a, Cartoon of CB1 
(translucent, teal) with a cartoon of 8,AR (PDB accession number 2RH1; 
translucent, orange) superimposed. The r.m.s.d. for the Ca positions 

is 2.6 A. The beta blocker carazolol is represented by yellow spheres, 
taranabant by magenta spheres. At the bottom is a 2D representation 


The orthosteric binding pocket of CB1 is highly hydrophobic, as is 
expected for a lipid-activated receptor. Of the 24 residues within 4A 
of the ligand, there are only three polar side chains: Asp104, whose 
acidic side chain points towards the extracellular space; Ser123!? near 
the access channel that forms a polar contact with the trifluoromethyl 
group of taranabant; and Ser383’°°, which has been implicated in 
agonist binding”*. By contrast, a large number of hydrophobic residues 
(including six Phe, three Met, two Trp, three Leu, and three Ile side 
chains) line the orthosteric pocket and make a variety of hydrophobic 
contacts with taranabant, burying 1,109 A? of surface area (Fig. 3b). 
All of the taranabant contact residues on CB1 are absolutely conserved 
across the vertebrate lineage, with the exception of Ile105, which can 
be replaced by Met (Extended Data Fig. 8). The major divergence 
between CB1 and CB2 within the subset of binding residues lies in the 
membrane-proximal N-terminal region, where Phe102, Met103, 
Asp 104, Ile105, and Phe108 make van der Waals contacts with tarana- 
bant. The subtype selectivity of taranabant for CB1 may arise from the 
divergence of this region between CB1 and CB2. 

Taranabant (Fig. 3a) and rimonabant (Fig. 4a) have related chemi- 
cal structures and similar conformational properties in isolation”’. 
Docking of rimonabant with the CB1 crystal structure yielded a low- 
energy pose that overlaps almost completely with that of taranabant, 
contacting the same constellation of residues (Fig. 4a). This supports 
the use of the current structure to analyse the binding modes of both 
ligands. Mutagenesis studies have identified several residues whose 
mutation caused a loss in taranabant and/or rimonabant binding 
affinity?” *°. Indeed, many of these residues are in contact with 
the ligand in the CB1 structure, including Phel702°’, Phe174?°!, 
Leul93*”?, Trp279° ee Trp356°8, Phe3797*, and Leu3877. However, 
several residues on TM3 and TMS (for example, Lys19238, Phe200°9, 
and Tyr275°*°) are not within contact distance of taranabant and 
appear to make indirect contributions to binding, through structural 
stabilization or influence of the conformational equilibrium of CB1. 

To gain insight into the initial recognition of agonists by the CB1 
receptor, we docked THC (a partial agonist®) into our crystal structure 
coordinates using the program Glide (see Methods). The top docking 
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of taranabant. b, Contact residues with 4 A of taranabant in the CB1 
structure. The receptor side chains are shown as grey sticks, the backbone 
is a transparent cartoon and taranabant is in magenta. Top view is from 
within the plane of the bilayer (TM6 and TM7 cartoons removed for 
clarity), bottom view is from the extracellular space. 


poses have the tricyclic core of THC binding between TM1, TM2 and 
TM7 (as with taranabant), with the C3 alkyl chain overlapping with the 
chlorophenyl moiety of taranabant and extending towards Trp356°*8 
(Fig. 4b). Conformational changes in this residue and its surroundings 
have been proposed as a trigger for CB1 activation, and mutation to ala- 
nine leads to enhanced stimulation (Emax) by CB1 agonists*". Previous 
mutagenesis experiments have also identified Phe174?%!, Leu193°”9, 
and Ser3837* as important residues for binding of THC or related 
agonists such as CP55940 (refs 26, 32). These residues are either in 
contact with or in close proximity to the preferred docking pose of 
THC. One caveat to these calculations is that the inactive structure 
of CB1 is not ideal for predicting high-affinity agonist interactions. 
It should, however, be noted that the crystallization construct (stabi- 
lized in an inactive conformation) still displays significant affinity for 
CP55940 (Extended Data Fig. 2). Finally, Cys355°*” on the bilayer- 
facing side of TM6 was reported to form a covalent adduct with a THC 
analogue that possesses a reactive group at the end of the C3-pentyl 
chain**. Starting with our THC pose, such cross-linking would require 
rotation of TM6 at the orthosteric pocket during CB1 activation and 
consequent disruption of the packing around Trp356°*. 

While our manuscript was under review, a crystal structure of human 
CB1 was reported*# bound to the antagonist AM6538, which closely 
resembles rimonabant but has a nitrate group substituted on ‘arm 2’ 
of the rimonabant core (that is, the chlorophenyl moiety in Fig. 4a). 
Although the taranabant-bound crystal structure reported here and 
the AM6538-bound structure are in general agreement (Extended Data 
Fig. 9a), there are several differences that may be important for func- 
tional interpretation and prediction. Notably, the electron density for 
the ligand and the important N-terminal region is weak in the AM6538- 
bound structure, with high B factors in the refined model (average 
B= 134.3 A? for residues 99-112 and B= 119.5 A? for the ligand). By 
contrast, the equivalent region in our taranabant-bound structure is 
very well ordered, with good density and much lower B factors (average 
B=61.7 A? for residues 100-112 and B = 42.0 A? for the ligand) 
(Extended Data Fig. 9b, c). The lack of clear density and resulting model 
ambiguity for the N-terminal region in the AM6538-bound structure 
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Figure 4 | Docking of rimonabant and THC to the CB1 receptor. 
a, Overlay of the crystal structure pose of taranabant (transparent magenta 
sticks) with the top-scoring docking pose of rimonabant shown using 
orange sticks (see Methods). The contact residues within 4 A of taranabant 
are shown as transparent teal sticks. The 2D structure of rimonabant is 
shown at upper left. b, Top-scoring docking pose of THC is shown as light 
green sticks, along with taranabant (transparent magenta sticks). Selected 
residues important for the binding of THC and agonist activity are shown 
as teal sticks. TM7 cartoon is removed for clarity. The 2D structure of THC 
is shown on the bottom right. 


may limit its utility for predicting the binding modes of other ligands. 
This is apparent in the erroneous docking prediction for taranabant, 
in which arm 1 and arm 2 (chlorophenyl and cyanophenyl groups) are 
swapped relative to their experimentally determined binding positions 
reported herein. Further biochemical and computational studies will be 
required to establish the relative utility of these two crystal structures 
as templates for ligand docking and design. 

GPCRs adopt multiple conformations, creating a complex energy 
landscape that allows the binding of different ligands to modulate 
different intracellular effectors, such as G proteins and arrestin*. 
CB1 has considerable agonist-independent constitutive activity*® 
and exhibits subtle and paradoxical pharmacological properties; 
it is antagonized by cannabidiol (a molecule that, but for a bond 
disconnection, is near-identical to THC)* and inhibited by the com- 
pound ORG27569, which allosterically increases agonist affinity but 
decreases subsequent receptor activation*”. Understanding these 
phenomena will require additional structures of CB1 in different 
conformational states, bound to a range of ligands (both orthosteric 
and allosteric) of differing efficacy. Our structure of CB1 bound to 
taranabant represents a step in this direction and provides a crys- 
tallographic basis for computational design of cannabinoid system 
modulators. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Cloning, expression and purification. The wild-type human CB1 receptor gene 
(Uniprot Entry: P21554) was cloned into a modified pFastBac (Invitrogen) bacu- 
lovirus expression vector with the haemagglutinin (HA) signal sequence followed 
by a Flag epitope tag at the N terminus and a 10 His-tag at the C terminus. 
To facilitate receptor crystallization, the 76 N-terminal residues were removed, 
a TEV protease recognition site was introduced before residue Lys90, and the 51 
C-terminal residues were deleted (truncation after Pro421). Residues 302-332 in 
the CB1 intracellular loop 3 (ICL3) were replaced with a synthetic DNA fragment 
containing the 196-amino acid coding sequence of P. abyssi glycogen synthase 
(PDB accession number: 2FBW). Finally, the mutation T210A was introduced by 
an adapted Multi-site Quickchange protocol (Stratagene). 

The final CB1(T210A)—PGS fusion construct was transfected into DH10Bac to 
produce a recombinant baculovirus with the Bac-to-Bac system (Invitrogen). The 
recombinant baculovirus was used to infect Sf9 insect cell culture at a cell density 
of 2.5 x 10° cells per ml~!, with 1 {1M taranabant (Tocris) added to the medium. 
Infected cells were grown for 60h at 27°C before harvesting, and the cell pellets 
were stored at —80°C for future use. 

Sf9 cell membranes were disrupted by thawing frozen cell pellets in a hypo- 
tonic buffer containing 10 mM Tris pH 7.5, 1mM EDTA, 160g ml! benza- 
midine, 100 1g ml“! leupeptin, 2 mg ml™! iodoacetamide and 1 1M taranabant. 
The cell membranes were centrifuged at 10,000 g for 20 min at 4°C. Membrane 
pellets were solubilized in a buffer containing 50 mM HEPES, pH 7.5, 500 mM 
NaCl, 1% (w/v) n-dodecyl 3-p-maltopyranoside (DDM; Anatrace), 0.2% sodium 
cholate, 0.2% cholesteryl hemi-succinate (CHS), 10% glycerol, 160 1g ml"! ben- 
zamidine, 100 1g ml™ leupeptin, 2 mg ml7! iodoacetamide and 10|.M tarana- 
bant for 1h at 4°C. The supernatant was isolated after ultra-centrifugation for 
30min at 100,000g and incubated with Ni-NTA agarose beads (GE Healthcare) 
in batch for 3h at 4°C. After binding, the beads were collected by centrifugation 
at 100g and washed with five volumes of Ni-NTA wash buffer (50 mM HEPES, 
pH 7.5, 500 mM NaCl, 0.05% (w/v) DDM, 0.01% sodium cholate, 0.01% CHS, 
10% glycerol, 50 mM imidazole, 160 1g ml! benzamidine, 100 Lg, ml! leupeptin 
and 11M taranabant). After transfer to a gravity column, beads were washed 
with 15 column volumes of Ni-NTA wash buffer, and receptor protein was eluted in 
Ni-NTA wash buffer with 200 mM imidazole and 2 mM calcium. The eluted protein 
was then loaded by gravity flow over anti-Flag M1 affinity resin. Detergent 
was exchanged from 0.05% DDM to 0.05% lauryl maltose neopentyl glycol 
(LMNG) on the MI resin. Finally the pure receptor was eluted with 0.2mg ml! 
Flag peptide and 5mM EDTA. TEV protease (1:10 w/w) and PNGase F were 
added to the eluate, and protein was incubated at 4°C overnight. Finally, the 
receptor was run on a Superdex 200 size-exclusion column (GE Healthcare) 
with buffer containing 20 mM HEPES, pH 7.5, 150mM NaCl, 0.05% LMNG, 
and 1\.M taranabant. 

Differential scanning fluorimetry. Protein samples were purified and prepared 
in the absence of ligand (apo), with taranabant, or with rimonabant, as described 
above. Differential scanning fluorimetry assays were performed in 96-well PCR 
plates using a real-time PCR machine (CFX96, Bio-Rad). Standard assay condi- 
tions (25 11) were 25 mM HEPES pH 7.5, 150 mM NaCl, 0.01% LMNG, 0.002% 
CHS and 101M of the corresponding ligands. The protein concentration was 21M 
and the BODIPY FL--cystine dye*® was added at 2 1M final concentration. All 
reactions were incubated at 4°C for 20 min before scanning in the PCR machine. 
The fluorescence was measured at 0.5°C temperature intervals from 4°C to 90°C 
by using the FAM filter set (450-490 nm excitation, 515-530 nm emission). 
Crystallization. Purified receptor was concentrated to 55mg ml! using a 
100-kDa cut-off Vivaspin column (Sartorius), and crystallized using the LCP 
method. The concentrated receptor was reconstituted into a lipid mixture con- 
taining monoolein plus 10% (w/w) cholesterol (Sigma), at a ratio of 2:3 receptor 
to lipid (by weight). Mixing was performed at room temperature using a syringe 
mixing apparatus as previously described*’. The mesophase was dispensed in 
40-nl drops onto 96-well glass plates and overlaid with 800 nl precipitant solution 
using a Gryphon LCP robot (Art Robbins Instruments). Crystals grew to full size 
after 2 weeks at 20°C in the following overlay precipitant condition: 31% PEG 400, 
100 mM sodium citrate pH 5.5, 100mM magnesium sulfate. The crystals were 
harvested from LCP setups using MiTeGen loops and cryoprotected in liquid 
nitrogen. 

Data collection and processing. X-ray diffraction data were collected at 
GM/CA-CAT beamline 23ID-B at the Advanced Photon Source (APS), Argonne 
National Laboratory, equipped with an Eiger 16M detector. Datasets were acquired 
using a beam size of 20|1m with 1.033-A wavelength X-rays. For each crystal, fifty 
0.4° oscillation images were collected, with 1-s exposure and without attenuation 
of the beam. Owing to radiation damage of crystals, a 97% complete diffraction 
data was merged from 42 crystals and scaled using HKL3000”. The dataset was 
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processed in space group P2,, and a resolution cut-off of 2.6 A was selected by 
examining CCj/2 values after anisotropy correction in HKL3000. 

Structure determination and refinement. The structure of CB1(T210A)—PGS in 
complex with taranabant was solved by molecular replacement with Phaser! using 
human S$1P, receptor’? (PDB: 3V2Y) and PGS” (PDB: 2BFW) as independent 
search models. The solution was improved through iterations of manual building 
in Coot’, followed by refinement using Refmac5 (ref. 44). Translation-libration- 
screw refinement was used to model atomic displacement factors. Refinement 
parameters for the taranabant ligand were generated using the PRODRG* web 
server. The resulting statistics for data collection and refinement are included in 
Extended Data Table 1. The final structure had 96.6% of residues in the favoured 
region of the Ramachandran plot, 3.4% in the allowed region, and 0 residues 
disallowed. Figures were prepared using Pymol (Schrodinger LLC). The electrostatic 
potential surface shown in Fig. 1c was calculated using APBS**. 

Binding of ligands to the CB1 receptor. Ligand-binding experiments on mem- 
branes containing CB1 wild-type, CB1-PGS, and CB1(T210A)-PGS were carried 
out based on a previously published protocol”. Sf9 cells expressing each construct 
(without any ligand present) were used to generate membranes by Dounce homo- 
genization and differential centrifugation’®. Saturation binding was carried out by 
incubating 1.5-5 1g of membranes with different concentrations of [7H]SR141716A 
(54 Ci mmol}; Perkin-Elmer) between 0.05 and 25.6 nM in assay buffer (25 mM 
Tris pH 7.5, 5mM MgCl2, 1mM EDTA) containing 0.1% protease-free BSA ina 
final volume of 250 1] per tube. Reactions were incubated at 30°C for 1h and then 
quenched with 250 il assay buffer with 5% BSA. Non-specific binding was deter- 
mined using reactions containing 11M unlabelled ligand. Reactions were separated 
on a vacuum manifold using GF/C filters (pre-soaked in assay buffer supplemented 
with 0.5% polyethylenimine) to retain membranes and discard unbound ligand. 
After washing four times with cold assay buffer, bound radioactivity was quantified 
using a scintillation counter. For competition-binding experiments, aliquots of 
membranes were incubated with 3nM [°H]SR141716A, and varying concentra- 
tions of competitor ligands (taranabant or CP55940) were included in the binding 
reactions. All binding experiments were carried out as three independent experi- 
ments, each performed in duplicate. Data analysis and fitting was performed with 
GraphPad Prism (GraphPad Software Inc.). 

Molecular dynamics simulations. The system used for molecular dynamics simu- 
lation consisted of one copy of CB1 receptor (PGS domain removed), taranabant, 
240 POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) molecules, 
48 Na‘, 57 Cl-, and 17,087 water molecules. Molecular dynamics simulations 
were performed with periodic boundary condition to produce isothermal-isobaric 
ensembles using the modified PMEMD.CUDA program in AMBER 14 (ref. 47). 
Temperature was regulated using Langevin dynamics“ with a collision frequency 
of 5 ps _!. Pressure was regulated using the isotropic position scaling algorithm 
with the pressure relaxation time set to 1.0 ps. The integration of the equations of 
motion was conducted at a time step of 1 fs for the relaxation phase and 2 fs for 
the equilibrium and sampling phases. After a 5-ns equilibration, a 55-ns molecular 
dynamics simulation was performed at 298 K, 1 bar to produce constant tempera- 
ture and pressure ensembles. The transmembrane helices were very stable in both 
simulations and the mean r.m.s.d. values were 1.52 + 0.13 and 1.45 +0.23 A for the 
apo and complex forms, respectively. The r.m.s.d. values of the membrane-proximal 
N-terminal region of the complex form (0.96 + 0.24 A) were smaller than those of 
the apo form (1.28 +0.19 A). 

Docking of rimonabant and THC. Molecular docking was performed for 
taranabant, rimonabant, and THC using Glide*?”*®, implemented in the 
Schrodinger software package (http://www.schrodinger.com). Different proto- 
cols of receptor preparation, grid generation and flexible ligand docking were 
evaluated and the one that produced the best docking scores was adopted. The 
optimal Glide protocol for CB1 included: only optimize hydrogen atoms in the 
receptor preparation; allow hydroxyl and thiol groups of Thr197, Ser383 and 
Cys386 to be rotatable; use the standard precision scoring function. We first 
tested our docking protocol by re-docking the taranabant ligand from the crystal 
structure. The best docking scores were —12.76 and — 12.59 kcal mol! for the 
crystal conformation and a 3D conformation generated without any initial bias 
using the Concord program (http://www.certara.com), respectively. The r.m.s.d. 
between the crystal structure and docking pose was 0.55 A for the Concord 
conformation. Next, the antagonist rimonabant and the partial agonist THC 
were docked to the binding pocket using the same protocol. The docking scores 
of the best docking poses were —8.99 and —9.36 kcal mol! for rimonabant and 
THC, respectively. 

Data availability. Atomic coordinates and structure factors for the reported 
crystal structure have been deposited in the Protein Data Bank (PDB) under the 
accession code 5U09. All other data are available from the corresponding author 
upon reasonable request. 
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Extended Data Figure 1 | Differential scanning fluorimetry on purified CB1-PGS. 


a, Raw differential scanning fluorimetry traces of the receptor in 
the apo state or bound to each antagonist. b, First derivative analysis of data in a. 
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Extended Data Figure 2 | Ligand-binding properties of CB1 constructs. separate experiments, each performed in duplicate. The K; values 
a, Saturation binding of the antagonist [SH]SR141716A (tritiated (+ s.e.m.) of the three constructs for taranabant are 0.94-£0.17nM, 
rimonabant radioligand) to wild-type CB1, CB1-PGS, and CB1(T210A)- 1.10 + 0.16 nM, and 0.91 + 0.16 nM, respectively. c, Competition binding 
PGS. Error bars represent s.d. for three separate experiments, each of the agonist CP55940 to the wild-type CB1 receptor, CB1-PGS, 
performed in duplicate. The fitted Kg values (+ s.e.m.) for these three and CB1(T210A)-PGS. Error bars represent s.d. for three separate 
constructs are 4.8 + 0.7 nM, 6.3 £0.6 nM, and 4.4+ 0.5 nM, respectively. experiments, each performed in duplicate. The K; values (+ s.e.m.) of 
b, Competition binding of taranabant to the wild-type CB1 receptor, the three constructs for CP55940 are 53 + 12nM, 230+ 43 nM, and 
CB1-PGS, and CB1(T210A)-PGS. Error bars represent s.d. for three 384 + 62 nM, respectively. 
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Extended Data Figure 3 | Purification and crystallization of markers (molecular mass in kDa at left); IMAC/Flag-purified receptor; 
CB1(T210A)-PGS. a, Superdex 200 gel-filtration trace of receptor after same sample after treatment with PNGaseF; receptor after TEV protease 
Ni immobilized metal-affinity chromatography (IMAC) and M1 anti- cleavage (removing 89 N-terminal amino acids); final sample after 
Flag chromatography (see Methods). b, SDS-PAGE analysis of samples Superdex 200 gel filtration. c, Light microscopy image showing examples 
at different stages of purification. The five lanes from left to right are: of LCP microcrystals of CB1(T210A)-PGS used to collect diffraction data. 
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Extended Data Figure 4 | Packing and electron density in the (contoured at 1.2¢) of taranabant and the surrounding ligand-binding 
CB1(T210A)-PGS crystals. a, Lattice packing interactions in the residues. Protein and ligand are represented as yellow sticks. c, Stereo 
monoclinic crystals of CB1(T210A)-PGS. Protomers are shown as view of 2F, — F- electron density (contoured at 1.50) for only the ligand 
ribbons, with the receptor component of the fusion protein coluored taranabant (magenta sticks). 


teal and the PGS domain coloured grey. b, 2F, — F, electron density map 
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Extended Data Figure 5 | Residues lining the putative lipid access channel of CB1. The receptor is shown as a teal transparent surface, and taranabant 
is in magenta spheres. The three residues lining the channel are shown as orange sticks and their solvent-accessible surfaces are coloured orange. 
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Human 90 KENEENIOQCGEN) CFMVLN 112 

Horse 90 KENEENIQCGENI CFMILN 112 

Bovine 90 KENDENIQCGENI CFMILN 112 

Cat 90 KENEENIOQCGEN CFMILN 112 

Mouse 91 KENEDNIOQCGEN) CFMILN 113 

Chicken 92 KENEENIQCGEN CFMILN 114 

Frog 90 KDTDDNVOCGRN CFMILT 112 

Zebrafish 98 HAEDGSLOCGEN CFMILT 120 
Extended Data Figure 6 | Sequence alignment of the membrane- The blue box denotes positions that make contact with taranabant. 
proximal N-terminal region of CB1 from different vertebrate species. Alignment was performed using Clustal Omega (https://www.ebi.ac.uk/ 


‘Frog’ is Xenopus laevis. The red bar (top) indicates the part of this region Tools/msa/clustalo/). 
that is structured and visible in the electron density of the CB1 crystals. 
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Extended Data Figure 7 | Molecular dynamics simulation of the membrane-proximal N-terminal region. b, 60-ns molecular dynamics 
CB1 structure. a, A 60-ns molecular dynamics simulation of the CB1 simulation of the CB1 receptor without a ligand present. Black trace is for 
receptor (after removing the PGS fusion protein) with taranabant present. the entire receptor, red trace is for only the structured membrane-proximal 
Black trace is for the entire receptor, red trace is for only the structured N-terminal region. 
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Human CBl = 9 ==------- MKS ILDGLADTTFRTITTDLLYVGSNDIQYEDIKGDMASKLGYFPOKFPLT 51 
Mouse CBl = 9 =-------- MKS ILDGLADTTFRTITTDLLYVGSNDIQYEDIKGDMASKLGYFPQKFPLT 51 
Chicken_CBl =s MKS TILDGLADTTFRTITTDLLYVGSNDIQYEDMKGDMASKLGYYPQOKFPLS 51 
Zebrafish_CBl MLFPASKSDVKSVLDGVAETTFRTITSGLOYIGSNDIGYDDHIIDGDFSKSGYPLPKPFA 60 
Buman CBZ meme eae eee see eee re i a a a ae a a 0 
Human_CB1 SFRGSPFQEKMTAGDNPOLV-PA-DOVNITEFYNKSLSSFKENEENIQCGE 109 
Mouse_CB1 SFRGSPFQEKMTAGDNSPLV-PAGDTTNITEFYNKSLSSFKENEDNIQCGE 110 
Chicken_CBl SFRGDPFQEKMTAGDDPLLSIIPSDOQINITEFYNKSLSTFKENEENIQCGE 111 
Zebrafish _CBl AYRRSSFADKVAPDEELIVKGLPFYPTNSSDVFGN---WSHAEDGSLOCGE 117 
Human CB2 = = 9 ---------------------------------- MEECWVTEIANGSKDGLD 26 
. * . 
Human_CB1 VLNPSOOL V. TLGTFTVLENLLVLCVILHSRSLRCRPSYHFIGSLAVADL 169 
Mouse CB1l ILNPSOQOL V. TLGTFTVLENLLVLCVILHSRSLRCRPSYHFIGSLAVADL 170 
Chicken_CBl ILNPSQOL V: TLGTFTVLENLLVLCVILHSRSLRCRPSYHFIGSLAVADL 171 
Zebrafish_CBl ILTPSOOL. V- TLGTFTVLENLVVLCVILOSRTLRCRPSYHFIGSLAIADL 177 
Human_CB2 ILSGPORKT. V- LLGLLSALENVAVLYLILSSHQLRRKPSYLFIGSLAGADF 86 
o*, Ke Kg kKK, RR sel Rees KK SKK Ke KK SEK KKKKKK KkEEK KK: 
__™2_| SSS 
Human_CB1 Y DFHVFHRKDSRNVFLF 'ASFTASVGSLFLTAIDRYISIHRPLAYKRIVT 229 
Mouse_CB1 Y DFHVFHRKDSPNVFLF. 'ASFTASVGSLFLTAIDRYISIHRPLAYKRIVT 230 
Chicken_CBl a4 DFHVFHRKDSPNVFLF ASFTASVGSLFLTAIDRYISIHRPLAYKRIVT 231 
Zebrafish CBl Y, DFHVFHRKDSPNVFLF ASFTASVGSLFLTAIDRYVSIHRPLSYRRIVT 237 
Human_CB2 Cc FHVFHGVDSKAVFLL MTF TASVGSLLLTAIDRYLCLRYPPSYKALLT 146 
*, KK gg KKRKK ** KKK SKE KL KK PRRKKKKKK S KKKKKKES! Cog  g 3*s 23* 
=== a i 
Human CBl RPKAVVAFCLMWTIAIVIAVLPLLGWNCEKLOSVCSD HIDETYLM GVTSVLLLF 289 
Mouse_CB1 RPKAVVAFCLMWTIAIVIAVLPLLGWNCKKLQSVCSD LIDETYLM! GVTSVLLLF 290 
Chicken_CBl RPKAVVAFCVMWTIAIVIAVLPLLGWNCKKLNSVCSD LIDETYLM! GVTSVLLLF 291 
Zebrafish_CBl RTKAVIAFCMMWAISIIIAVLPLLGWNCKRLNSVCSD LIDENYLM! GVTSVLVLF 297 
Human_CB2 RGRALVTLGIMWVLSALVSYLPLMGWTCCP--RPCSE LIPNDYLL LFIAFLFSG 204 
ROSES SAS SS: SSS SG) eS RIS ORR RS: Sha ee 
__™s5 
Human_CB1 IVYAYMY ILWKAHSHAVRMIQRGTOKS IIIHTSEDGKVOQVTRPDQARMDIRLAKTLVLIL 349 
Mouse_CB1l IVYAYMY ILWKAHSHAVRMIORGTOKSIIIHTSEDGKVOVTRPDQARMDIRLAKTLVLIL 350 
Chicken CBl IVYAYMY ILWKAHSHAVRMLORGTOKSIIIQSTEDGKVOITRPDOTRMDIRLAKTLVLIL 351 
Zebrafish_CBl ITYAYMY ILWKAHHHAVRMLRRTSQKSLVVHSADGTKVOTPRPDQARMDIRLAKTLVLIL 357 
Human_CB2 ITIYTYGHVLWKAHQHVASLSG--—-—--—-------- HODROQVPGMARMRLDVRLAKTLGLVL 251 
Rs*S* La RRKKK eee < 2 * . KR gk geRKKKK **s3* 
Human_CB1 YDVFGKMNKLIKT NSTVNPITYALRSKDLRHAFR 409 
Mouse_CB1 YDVFGKMNKLIKT NSTVNPTIYALRSKDLRHAFR 410 
Chicken_CBl YDVFGKMNKLIKT NSTVNPIIYALRSKDLRHAFR 411 
Zebrafish _CBl YDLFWRMGDNIKT NSTVNPIITYALRSKDLRRAFL 417 
Human CB2 HSLATTLSDQVKK INSMVNPVIYALRSGEIRSSAH 311 
aR KKK KEKKSKL ls © ee SELL RRR KKKK KekK KKK SKKKKKK Sek fs 
Human_CB1 SMF---PSCEGTAQ--------------------- P---LDNSMGDSDCLHKHANNAASV 442 
Mouse_CBl SMF---PSCEGTAQ--------------------- P---LDNSMGDSDCLHKHANNTASM 443 
Chicken CBl SMF---PTCEGTAQ--------------------- P---LDNSM-ESDCQHKHANNAGNV 443 
Zebrafish_CBl AAC---QGCRGTST--------------------- TPLOLDNSL-ESDC------- HRNQ 445 
Human_CB2 HCLAHWKKCVRGLGSEAKEEAPRSSVTETEADGKITPWPDSRDLDLSDC--—--------- 360 
* ata KKK 
Human CBl HRAAESCIKSTVKIAKVTMSVSTDTSAEAL 472 
Mouse_CB1 HRAAESCIKSTVKIAKVTMSVSTDTSAEAL 473 
Chicken_CBl HRAAESCIKSTVKIAKVTMSVSTDTTAEAL 473 
Zebrafish_CBl HRAAESCVKTTVKIAKLTMSVSAETSAEAV 475 
Human CB2 0 22 meme ee 360 


Extended Data Figure 8 | Sequence alignment of the entire sequence of CB1 from several different species, along with human CB2. The blue boxes 
denote positions that make contact with taranabant within a 4 A cut-off. The alignment was performed using Clustal Omega (https://www.ebi.ac.uk/ 
Tools/msa/clustalo/). 
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Extended Data Figure 9 | Comparison of the structures of CB1 bound to 
taranabant and CB1 bound to AM6538 (ref. 34; PDB accession 5TGZ). 
a, Superposition of the two CB1 structures viewed from the extracellular 
space. The taranabant-bound structure is shown as a teal cartoon (ligand 
as magenta sticks), while the AM6538-bound structure is shown as a 

gold cartoon (ligand as green sticks). b, Comparison of 2F, — F, electron 
density (contoured at 1.5¢) for the ligands in each structure. On the left is 


Taranabant 
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AM6538 
(PDB 5tgz) 


CB1-AM6538 
(PDB 5tgz) 


taranabant from the current structure, on the right is AM6538 from ref. 34. 
c, Comparison of the membrane-proximal N-terminal regions in each 
structure. On the left is a side view of CB1 from the current structure, with 
2F, — F, electron density (contoured at 1.07) shown for the N-terminal 
region, TM1, and taranabant. On the right is the analogous side view of 
CB1 from ref. 34 (gold cartoon), with 2F, — F, electron density (contoured 
at 1.07) shown for the N-terminal region, TM1 and AM6538. 
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Extended Data Table 1 | Data collection and refinement statistics 


CB1-PGS with 


Taranabant! 

Data collection 
Space group P2, 
Cell dimensions 

a, b,c (A) 50.7, 80.4, 81.2 

B (°) 91.7 
Resolution (A) 50.00-2.60 (2.69-2.60)° 
Rom OF Rinerge! 0.19 (NA) 
ol 7.43 (0.96) 
Completeness (%) 96.8 (96.9) 
Redundancy 5.4 (5.1) 
CC}. in highest shell 0.69 
Refinement 
Resolution (A) 50-2.60 
No. reflections 11084 
Ryork/ Riree 0.19 / 0.23 
No. atoms 

Protein 3762 

Ligand/ion 56 

Other (Lipid and water) 125 
B-factors 

Receptor 45.5 

Fusion protein 38.3 

Ligand 42.0 

Ion 91.0 

Other (Lipid and water) 44.4 
R.m.s deviations 

Bond lengths (A) 0.008 

Bond angles (°) 1.20 


‘Diffraction data from 42 crystals were merged into a single dataset. 
Values in parentheses are for the highest-resolution shell. 
*Rmerge > 1 is statistically meaningless, Scalepack*° does not report it. 
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Structure of RNA polymerase I transcribing 


ribosomal DNA genes 


Simon Neyer!*, Michael Kunz?*, Christian Geiss’, Merle Hantschel, Victor-Valentin Hodirnau?, Anja Seybert?, Christoph Engel!, 


Margot P. Scheffer’, Patrick Cramer! & Achilleas S. Frangakis§ 


RNA polymerase I (Pol I) is a highly processive enzyme that 
transcribes ribosomal DNA (rDNA) and regulates growth of 
eukaryotic cells!“*. Crystal structures of free Pol I from the yeast 
Saccharomyces cerevisiae have revealed dimers of the enzyme 
stabilized by a ‘connector’ element and an expanded cleft containing 
the active centre in an inactive conformation’. The central bridge 
helix was unfolded and a Pol-I-specific ‘expander’ element occupied 
the DNA-template-binding site. The structure of Pol I in its active 
transcribing conformation has yet to be determined, whereas 
structures of Pol II and Pol III have been solved with bound DNA 
template and RNA transcript®'°. Here we report structures of 
active transcribing Pol I from yeast solved by two different cryo- 
electron microscopy approaches. A single-particle structure at 3.8 A 
resolution reveals a contracted active centre cleft with bound DNA 
and RNA, and a narrowed pore beneath the active site that no longer 
holds the RNA-cleavage-stimulating domain of subunit A12.2. 
A structure at 29 A resolution that was determined from cryo- 
electron tomograms of Pol I enzymes transcribing cellular rDNA 
confirms contraction of the cleft and reveals that incoming and 
exiting rDNA enclose an angle of around 150°. The structures 
suggest a model for the regulation of transcription elongation in 
which contracted and expanded polymerase conformations are 
associated with active and inactive states, respectively. 

The structure of Pol I, which consists of 14 subunits with a total 
molecular weight of 590 kDa, has been previously described in atomic 
detail in an inactive conformation using X-ray crystallography’. 
To determine the structure of transcribing Pol I, we performed 
single-particle cryo-electron microscopy (single-particle cryo-EM) 
with a reconstituted yeast Pol I elongation complex containing a DNA- 
RNA scaffold (Fig. 1a, Extended Data Fig. 1), similar to that used to 
study transcribing mammalian Pol II''. Particle classification enabled 
us to reconstruct the Pol I elongation complex structure at 3.8 A resolu- 
tion from approximately 94,000 single particles (Fig. 1c, Extended Data 
Figs 2, 3). The electron density revealed the downstream DNA, the 
DNA-RNA hybrid (Fig. 1b), and all Pol I domains with the exception 
of the flexibly linked C-terminal domain of subunit A49 (refs 12, 13) 
and the C-terminal domain of subunit A12.2. An atomic model 
was obtained by fitting rigid domains of the Pol I crystal structure’, 
positioning nucleic acids from the bovine Pol II elongation complex 
structure'!, and manually rebuilding regions that were structurally 
altered (Extended Data Table 1). 

Comparison of the resulting Pol I elongation complex structure 
with the previous Pol I structure™® reveals that the active centre cleft is 
contracted by up to 13 A (Fig. 2a). Contraction occurs through relative 
movement of the two major polymerase modules ‘core’ and ‘shelf’!*, as 
predicted®. The shelf module moves together with the clamp domain 
as a single ‘shelf-clamp’ unit, slightly rotating with respect to the 
core module (Fig. 2e, Supplementary Video 1). Another module, the 


‘jaw-lobe, moves closer to downstream DNA by up to 7A (Fig. 2a, d). 
Comparison of the Pol I elongation complex with elongation complex 
structures of Pol II®?, Pol III!°, and bacterial RNA polymerase!*!¢ 
reveal that all of these polymerases adopt a similar contracted confor- 
mation in their transcribing state and underscores the fundamental 
structural and mechanistic similarity of cellular RNA polymerases from 
bacteria to eukaryotes (Extended Data Fig. 4). 

In the elongation complex structure, the connector is detached from 
Pol I, as observed when Pol I is bound to the initiation factor Rrn3 
(ref. 13, 17). The expander is also displaced, enabling Pol I to form 
extensive interactions with the DNA-RNA hybrid (Fig. la). The 
enzyme contacts the DNA template at positions +4 to —9 and the RNA 
transcript at positions —1 to —8 (+1 represents the nucleotide addition 
site). Pol I generally binds nucleic acids with the same elements as 
Pol II’, but uses several Pol-I-specific residues to contact the upstream 
part of the DNA-RNA hybrid. The active centre adopts a catalytically 
competent conformation. The bridge helix is folded throughout 
(Fig. 2b, c). The trigger loop has weaker electron density, indicating 
higher mobility. The tip of the trigger loop lacks density, allowing for 
binding of the nucleoside triphosphate substrate. The polymerase 
switch regions and cleft loops adopt similar positions as in the Pol 
II elongation complex’ except that fork loop 1 is bent away from the 
hybrid (Extended Data Fig. 5a), as in the Pol III elongation complex'® 
and ina Pol II initiation intermediate’®. 

The Pol I elongation complex structure also provides insights into the 
regulation of the intrinsic RNA cleavage activity of Pol I. RNA cleavage 
requires subunit A12.2 (refs 19, 20), which consists of two domains. The 
N-terminal domain resembles that of the Pol I] subunit Rpb9, whereas 
the C-terminal domain corresponds to the catalytic domain of the Pol II 
RNA cleavage factor TFIIS”!”. In the elongation complex structure, 
the N-terminal domain of A12.2 remains at the outer rim of the Pol I 
funnel region, whereas its C-terminal domain is displaced from the 
pore that it occupies in the Pol I crystal structures*®!”. Displacement of 
the A12.2 C-terminal domain from the pore apparently occurs during 
cleft contraction because modelling of this domain in the pore results 
in a clash with the contracted shelf module (Extended Data Fig. 5b, c). 
Thus A12.2 can only enter the active centre when the cleft is fully or 
partially expanded. This predicts that Pol I adopts a partially expanded 
conformation during A12.2 action, which is required for RNA proof- 
reading and polymerase reactivation after backtracking. 

To investigate whether the structural differences between the Pol I 
elongation complex and the free Pol I dimer arise from nucleic acid 
binding or from conversion of a dimer to a monomer, we also solved 
the structure of monomeric Pol I in the absence of nucleic acids at 
4.0 A resolution using approximately 80,000 single particles from the 
same data set (Extended Data Figs 2, 6a, Methods). In this structure, 
the connector and expander were also displaced, but the cleft was only 
partially contracted, similarly to the Pol-I-Rrn3 complex'*!” (Extended 
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Figure 1 | Single-particle cryo-EM structure of yeast Pol I elongation 
complex at 3.8 A resolution. a, Nucleic acid scaffold and interactions 
between Pol I and nucleic acid. Template DNA, non-template DNA and 
RNA are shown in medium blue, sky blue, and red, respectively. Filled 
circles represent nucleotides that were well resolved in the electron density. 
Pol I residues within 4A distance are depicted together with the subunit 
identifier (A for A190, B for A135). b, Electron density for the DNA-RNA 
hybrid with the final model superimposed. The active site metal ion A 


Data Fig. 6b, c). The central bridge helix remained partially unwound, 
and the C-terminal domain of A12.2 remained in the pore (Extended 
Data Fig. 6d-f), confirming that the partially expanded conformation 
is required for A12.2-dependent RNA cleavage. 

Thus, conversion of the Pol I dimer to a monomer leads to a par- 
tially expanded conformation, but not to the fully contracted active 
conformation. The partially expanded conformation resembles the 
conformation observed when the enzyme adopts a paused” or an 
inhibited” state (Extended Data Fig. 6g). In both, the bacterial pol- 
ymerase and Pol I, movement of a rigid shelf-clamp unit away from 
the core module allows for expansion of the cleft and a coordinated 
widening of the pore (called the ‘secondary channel’ in bacterial RNA 
polymerase). This movement involves a slight rotation of the shelf- 
clamp unit with respect to the core module, reflected in the term 
‘ratcheting’ used in one study of the bacterial polymerase”. 

Available data thus suggest that RNA polymerases can adopt par- 
tially expanded and contracted conformations that are associated with 
inactive and active states, respectively. Binding of nucleic acids in the 
cleft apparently maintains the contracted conformation and excludes 
A12.2 from the pore, whereas rearrangements in the nucleic acids upon 
misincorporation or pausing could induce the partially expanded con- 
formation that is transcriptionally inactive but enables A12.2 entry into 
the pore and enzyme reactivation by RNA cleavage. According to this 
model, transcription elongation can be regulated by allosteric coupling 
of nucleic acid binding with cleavage factor binding in the cleft and 
pore, respectively, via contraction and expansion of the polymerase. 

To investigate the physiological relevance of the single-particle 
cryo-EM structure, we further determined the structure of the natural 
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is depicted as a magenta sphere. c, Ribbon model of the Pol I elongation 
complex. The view is from the ‘front’!* with the incoming downstream 
DNA pointing towards the reader. The colouring of the surfaces is 
according to the standard polymerase subunit colouring: A190, grey; A135, 
wheat; A49, light blue; A43, slate; AC40, red; A34.5, pink; Rpb5, magenta; 
Rpbé6, silver-blue; AC19, yellow; Rpb8, green; A14, hot pink; A12.2, 
orange; Rpb10, blue; Rpb12, lemon. Template DNA, non-template DNA 
and RNA are depicted in medium blue, sky blue and red, respectively. 


Pol I elongation complex that forms in yeast cells by promoter- 
dependent initiation on rDNA with the use of cryo-electron 
tomography (Fig. 3). We spread active rDNA genes from exponentially 
growing yeast cells onto an electron microscopy grid such that they 
formed ‘Miller trees’”> (Extended Data Fig. 7). To overcome previous 
limitations in sample preparation, we used instant plunge-freezing to 
keep the sample in a close-to-native environment. The obtained images 
revealed the detailed arrangement of Pol I enzymes along rDNA, 
nascent RNA emerging from Pol I, and large densities at the RNA ends 
that resemble 5' classical knobs”® (Fig. 3a, b). From the cryo-electron 
tomography images, we selected 11 complete Miller trees and several 
smaller Pol I trails, each containing 10-20 Pol I enzymes with associated 
RNA. This yielded 993 transcribing Pol I enzymes for further analysis. 

We observed that each rDNA gene is loaded with ~70 Pol I enzymes, 
which showed a median centre-to-centre distance of 18 + 10 nm (Fig. 3c, 
Extended Data Fig. 8a), consistent with previous results”°. Only ~2% 
of the Pol I complexes were separated by a distance of less than 12 nm, 
which would allow for interaction between enzymes. Furthermore, 
consecutive enzymes show random relative orientations, arguing 
against specific interactions that were suggested previously”’. The 
incoming and exiting rDNA enclose an angle of ~150° measured at 
each triple of successive Pol I molecules (Fig. 3c, Extended Data Fig. 8b). 
This angle was independent of the length of the DNA between enzymes 
and could not be obtained from the single-particle cryo-EM structure, 
because density for upstream DNA was poor. 

After classification we performed sub-tomogram averaging (n = 225) 
to obtain a cryo-electron tomography structure of the cellular transcrib- 
ing Pol I at a resolution of ~29 A (Fourier shell correlation (FSC) 0.5 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


A190 
E414 


=~ << S 
FY A190 


Z) 


c Free Pol | Y1018 


Pol | EC 


M1000 


— 


Free Pol | Hybrid/expander 


Pol | EC 


Shelf module 


Naat 
+ \ 
Downstream DNA XY ‘Front view Front view 


criterion; ~25 A with the FSC 0.143 criterion; Extended Data Fig. 9a). 
The cryo-electron tomography structure strongly resembled the 
single-particle cryo-EM structure, showing an overall cross-correlation 
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Figure 2 | Cleft contraction and module 
movements. a, Comparison of structures of 

Pol I elongation complex (EC) (orange) and 

free dimeric Pol I (PDB: 4C2M; black) after 
superposition of their A135 subunits. Cleft width 
was measured between subunit A190 residue 
E414 and subunit A135 residue K434. For clarity, 
only subunits A190 and A135 are displayed. 

b, Electron density of the folded bridge helix in 
the Pol I elongation complex. c, Comparison of 
bridge helices in the elongation complex (orange) 
and free Pol I (black). d, Pol I elongation complex 
ribbon model coloured by four mobile modules. 
The peripheral subcomplexes A14—A43 and 
A49-A34.5 are omitted for clarity. e, Movements 
of polymerase modules upon cleft contraction. 
Ribbon models of free Pol I (grey) and elongation 
complex are shown after superposition of 

their core modules (omitted). Arrows indicate 
movement and rotation of the clamp-shelf and the 
jaw-lobe modules. Colour code as in Fig. 1. 


Side view 


score of 0.85. An FSC plot between the cryo-electron tomogra- 
phy and single-particle cryo-EM structures decreased beyond the 
0.143 threshold at 31 A (Extended Data Fig. 9a). The peripheral 


Figure 3 | Cryo-electron tomography analysis 
of Pol I transcribing rDNA genes. a, 2-nm 

thick tomographic slice though a cryo-electron 
tomography image with two of the Miller trees, 
showing the terminal knobs (grey circles), 

the DNA (typical examples marked by blue 
arrows), the RNA (red-pink arrows), and the 

Pol I enzymes (yellow and dark yellow circles 

for Miller trees 1 and 2, respectively). Several 
nucleosomes are attached to DNA like beads ona 
string (white box). b, Three-dimensional surface 
rendering of Miller tree 1 in (a), showing the 
terminal knobs (light grey), DNA (blue), RNA 
(red), possible RNA-modifying complexes (cyan), 
and Pol I complexes (yellow). c, Schematic of 
three successive Pol I enzymes (of the upstream 
Pol I, the central Pol I, and the downstream 


i> Pol I) overlaid with their probability density 


localization (heat map) (Extended Data 

Fig. 8a, b). d, Fit of the Pol I elongation complex 
ribbon model from single-particle cryo-EM into 
the cryo-electron tomography reconstruction in 
grey. The good fit observed here is not possible 
with the expanded conformation of Pol I 
(Extended Data Fig. 9b). Colour code as in Fig. 1. 
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subcomplexes A14—A43 and A34.5-A49 were flexible, consistent with 
the weaker density observed in the single-particle cryo-EM structure 
(Extended Data Fig. 2). The width of the active centre cleft was the 
same in both structures (Fig. 3d), confirming that the contracted 
single-particle cryo-EM structure represented the natural conformation 
of actively transcribing Pol I (Extended Data Fig. 9b). 

Taken together, we used here two independent cryo-electron micro- 
scopic approaches to define the contracted Pol I conformation as the 
active transcribing state of the enzyme. This revealed that all three 
eukaryotic RNA polymerases adopt a highly similar closed active centre 
conformation during transcription elongation. Together with published 
data, our results provide evidence that the elongation phase of tran- 
scription is regulated by cleft contraction and expansion. In particular, 
rearrangements of nucleic acids in the cleft above the active site cleft 
may be coupled to binding of factors in the pore beneath the cleft. Thus 
Pol I does not only undergo induced fit to align nucleic acids with the 
catalytic site, it is apparently also regulated allosterically. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Preparation of Pol I elongation complex (EC). Endogenous 14-subunit Pol I 
was prepared from Saccharomyces cerevisiae as described previously, with some 
modifications. Yeast strain CB0O10 expressing a C-terminal Flag/10x 
histidine-tagged A190 subunit was fermented and collected during the exponential 
phase. For Pol I purification, 350 g of cells were used. Proteins were precipitated 
overnight at 4°C with ammonium sulphate (2 M). Re-solubilized Pol I was enriched 
by large-scale affinity purification with Ni-NTA beads (Qiagen). Further enrich- 
ment with anion and cation exchange chromatography yielded to pure Pol I 
enzyme. The sample was applied to a Superose 6 10/300 size-exclusion column 
(GE Healthcare) in 5mM HEPES (pH 7.8), 150mM potassium acetate, 1 mM 
MgCh, 10.M ZnCl, and 10\1M 8-mercaptoethanol. 

DNA and RNA were purchased from IDT and Exiqon (Vedbaek), respectively. 
The nucleic acid scaffold sequences were as follows. Template DNA, 5’-AAGC 
TCAAGTACTTAAGCCTGGTCATTACTAGTACTGCC-3’; nontemplate DNA, 
5'GGCAGTACTAGTAAACTAGTATTGAAAGTACTTGAGCTT-3’; RNA, 
5’-UAUCUGCAUGUAGACCAGGC-3’ (in underlined nucleosides, a methylene 
bridge connects the 2’-O and the 4’-C atoms of the ribose ring, thereby forming 
locked nucleic acids). Nucleic acids were annealed by continuously decreasing 
temperature from 95°C to room temperature over a period of 60 min. EC assembly 
was achieved by incubating Pol I (300 1g, 3.5mg ml!) with a twofold molar excess 
of scaffold for 10 min at room temperature (Extended Data Fig. 1). 
Single-particle cryo-EM. For single-particle cryo-EM, Pol I EC complexes at a 
concentration of 200 1g ml! were cross-linked with 0.9mM BS3 (Sigma Aldrich) 
for 30 min at 30°C after optimization (Extended Data Fig. 1). The reaction was 
stopped by adding 50 mM ammonium bicarbonate, and the sample was purified 
by size-exclusion chromatography on a Superose 6 3.2/300 column (GE Healthcare) 
equilibrated in 5mM HEPES (pH 7.8), 150mM potassium acetate, 1 mM MgCh, 
10,.M ZnCl, and 101M 8-mercaptoethanol. A 4] aliquot of 100,.g ml! puri- 
fied sample was applied to a glow-discharged (10s) R1.2/1.3 UltrAuFoil grid 
(Quantifoil), and plunge-frozen in liquid ethane (Vitrobot Mark IV (FEI) at 
95% humidity, 4°C, 8.5s blotting time, blot force 14). Dose-fractionated movies 
(30 frames, 0.25 s each) were collected at a nominal magnification of 130,000 x 
(1.05A per pixel) in nanoprobe energy-filtered transmission electron microscopy 
(EFTEM) mode at 300kV with a Titan Krios (FEI) electron microscope using 
a GIF Quantum s.e. post-column energy filter in zero loss peak mode and a K2 
Summit detector (Gatan). The camera was operated in dose-fractionation counting 
mode with a dose rate of ~7.5 electrons per pixel per second (0.25s single frame 
exposure) and a total dose of ~56 electrons per A*. Defocus values ranged from 
—0.6 to —3 1m with marginal (<0.1 1m) astigmatism. Global motion correction was 
performed as described”, but single-particle cryo-EM images were not partitioned. 
Single-particle cryo-EM image processing. Parameters of the contrast transfer 
function (CTF) on each micrograph were estimated with CTFFIND4 (ref. 29). 
In a first step, ~1,500 particles were picked with the semi-automated swarm 
method of EMAN2 e2boxer.py”®. Relion was used for the whole-image processing 
workflow*! unless stated otherwise. Reference-free 2D classes were generated, 
seven of which were used for template-based auto-picking after filtering to 20 A. 
We extracted 401,000 particles from 2,300 micrographs with a 230 x 230 pixel box 
and used them for further processing. Pixels with 5 standard deviations from the 
mean value were replaced with random values from a Gaussian noise distribution. 
All images were normalized to make the average density of the background equal 
to zero during pre-processing. False-positive particles showing very bright dots, 
which were presumably gold contamination, were removed by manual inspection 
or unsupervised 2D classification. The remaining 282,000 particles were aligned 
on a reference generated from the PDB entry 4C2M° filtered to 40 A. To correct 
for local motion and for radiation damage, we used the movie processing function 
of Relion including ‘particle polishing’ in which the resolution-dependent decay 
caused by radiation damage is taken into account*". Local resolution was estimated 
as described***?. 

During classification of single-particle cryo-EM images (Extended Data Fig. 2), 
we first separated out particles lacking nucleic acids. To this end, the Pol I cleft of 
the average resulting after the first round of alignment was masked. The subsequent 
classification led to four classes: (1) nucleic acid-free Pol I (115,000 particles); 
(2) Pol I elongation complex (94,000 particles; hereafter referred to as EC’); (3) Pol 
I elongation complex with an alternative DNA conformation (37,000 particles); 
and (4) other particles (35,000 particles). Among the nucleic acid-free polymerase 
particles, 80,000 particles displayed a defined position of the C-terminal domain of 
A12.2. We refer to the SP average of these particles as the ‘monomer’. 

Ina second step, a mask around the dimerization domain was applied to remove 
particles from which the A49-34.5 subcomplex dissociated. This led to 32,000 and 
40,000 particles in case of the Pol I monomer and Pol I EC, respectively. To visualize 
the mobile stalk, we then applied a mask around A14/43 during refinement 
allowing only local searches. 
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Gold-standard Fourier shell correlations (FSCs) were calculated during 

the 3D refinement in Relion between two independently refined halves of the 
data. According to the FSC 0.143 criterion, global resolutions of 4.0 A and 3.8A 
were estimated for Pol I monomer and EC structures, respectively, which were 
sharpened with temperature factors of —146 A” and —149 A?, respectively. 
Structural modelling. Two separate models were built for the monomer and the 
Pol 1 EC. PDB entry 4C2M? was used as the starting model in both cases. Models 
were constructed lacking the expander, connector and, in case of the EC, the 
C-terminal domain of A12.2. The models were further truncated by removing the 
peripheral subcomplexes A49-34.5 and A14-43. The starting models were placed 
in densities for the monomer and the EC by fitting in UCSF Chimera™, followed 
by rigid body fitting with a Phenix real space refinement*. Rigid body groups were 
defined based on module definitions originally proposed for Pol II“. A starting 
model for DNA and RNA was derived from bovine Pol II"! and further refined. 
Structurally altered regions were adjusted to the density in COOT*® followed 
by real space refinement in Phenix. To generate complete models, structure of 
subcomplexes A49-34.5 and A14-43 were fit into the classified map in Chimera. 
No changes were made within the domains during model building, except for 
A34.5 C-terminal tail. The models were validated using the FSC between the model 
and the map, EMRinger*” and Molprobity*®. 
Miller tree preparation and cryo-electron tomography imaging. Miller chro- 
matin spreads” were prepared with some modifications as described’, using the 
NOY1071 yeast strain with 25 copies of ribosomal DNA (rDNA) repeats”°. Yeast 
cells were grown to mid-log phase (absorbance (Ago0) = 0.4) in YPG medium sup- 
plemented with 1 M sorbitol at 30°C. YPG medium contains 1% (w/v) yeast extract, 
2% (w/v) bacto-peptone and 2% glucose. 1 ml yeast cell culture in mid-log phase 
was added for 4.5 min to the preheated 20T zymolyase (Amsbio, Biotechnology) 
solution (5 mg/200 il zymolyase in YPG medium at 30°C) for a slight digestion of 
the yeast cell wall. Subsequently, the yeast cell culture was centrifuged at 13.000 rpm 
for 15s and the pellet was resuspended in 1 ml of 0.0025% Trition-X-100 (Sigma- 
Aldrich) ddH,O at pH 9.2 adjusted with pH 10 buffer (Thermo Fischer Scientific). 
The yeast suspension was transferred to a flask containing 5 ml of 11 mM KCl 
solution. The lysate was pipetted and incubated in a hydrophobic plastic Petri dish 
(Carl Roth GmbH + Co. Kg) placed on a shaker for 45 min. Sucrose was excluded 
from the sucrose-formalin solution as used in ref. 39. To fix chromatin, 400 1l 
of 37% formaldehyde (Sigma-Aldrich) solution was applied for 5 min. The yeast 
lysate was deposited on electron microscopy grids with a ~30 nm thick carbon 
support layer evaporated by a carbon coater 208Carbon (Cressington) and glow 
discharged for ~1 min using a home-made device. Subsequently they were placed 
within home-built grid chamber insets and centrifuged within an Eppendorf 5810R 
centrifuge (Eppendorf) for 5 min at ~2,200g at 20°C. Before plunge-freezing 
the grids were transferred to an 11mM KCI solution for which ddH0 at pH 9.2 
was used. 

The grids were immediately plunge-frozen in liquid ethane by a Vitrobot Mark 
IV (FEI) with 25 blotting force, 3s blotting and 10-15s draining time and the 
blotting chamber set to 100% humidity at 10°C. Cryo-grids were mounted into 
autoloader grids with C-clippings (FEI) in an EM FC6 cryo-microtome (Leica) that 
was cooled with liquid nitrogen under gaseous flow to —150°C. During mount- 
ing, grids were visually inspected to determine whether they contained an intact 
carbon film. 

Tilt-series were recorded using DigitalMicrograph (Gatan Inc.) at a nominal 
magnification of 33,000 x (4.0A per pixel) in EFTEM mode at 300keV using a 
Titan Krios with a GATAN GIF Quantum s.e. post-column energy filter in zero 
loss peak mode and a K2 Summit detector. The camera was operated in counting 
mode with a dose rate of ~15 electrons per pixel per second and a total dose of 
~100 electrons per A?. The tilt-series ranged from —63° to +63° with an angular 
increment of 2° and defocus set at —5 1m. Tilted images were fiducial-less aligned”! 
and reconstructed by super-sampling SART“’. The CTF was measured and 
corrected in slices in 3D. 

Reconstruction and segmentation of Miller trees. 3D reconstructions were 
visualized with the EMpackage in Amira (FEI & Zuse Institute)“ and analysed 
in TOM package*’. Segmentation of the Miller trees was performed manually in 
Amira by drawing contours encompassing individual features on mildly Gaussian 
low-pass filtered tomograms using the high-contrast option of super-sampling 
SART®. 

Sub-tomogram averaging of Pol I enzymes. Sub-tomograms containing tran- 
scribing Pol I enzymes on rDNA were manually selected. The enzymes were 
re-centred using a Gaussian blob of the size of Pol I. The positions of all enzymes were 
subsequently indexed such that they were placed sequentially on the DNA. As the 
DNA was visible in the reconstructions, the indexing was unambiguous (Extended 
Data Fig. 8d). For sub-tomogram averaging (that is, the cryo-electron tomo- 
graphy structure) we selected five Miller trees according to the following criteria: 
(1) They visually showed a transcriptional directionality (several Miller trees were 
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not completely visualized in the field of view). (2) All Pol I enzymes aligned accord- 
ing to the Miller tree directionality. (3) The RNA exit site matched previous obser- 
vations!! (Extended Data Fig. 8c, e). This resulted in a total of 225 Pol I enzymes 
contributing to the final sub-tomogram average. 

Sub-tomogram averaging was then performed on each Miller tree individually. 
This was to guarantee that the directionalities of the enzymes were not mixed 
owing to the globular shape of the enzyme, the pseudo-symmetry axis, and the 
varying ice thickness of the recording area leading to different signal-to-noise 
ratio among the enzymes. The Euler angles were determined a priori for each of 
the three consecutive Pol I enzymes per Miller tree by calculating the vector from 
centre-to-centre position. Constrained sub-tomogram averaging was performed 
on sub-tomograms with 64 x 64 x 64 voxels using a spherical mask (~20nm 
diameter). To ensure the robustness of the sub-tomogram averaging, two different 
starting references were used (1) the average of all rotationally pre-aligned Pol I 
enzymes per strand, and (2) a Gaussian blob of the size of Pol I. Both converged to 
approximately the same density. During sub-tomogram averaging of each individual 
Miller tree, polymerases were low-pass filtered and the alignment was run with 
a translational freedom of 10 voxels around the Gaussian blob refined position, 
a full rotational freedom for phi and psi, and a constrained rotational freedom of 
+30 degrees for § with 5 degrees sampling increment, until the average reached 
convergence. The missing wedge was taken into account during the entire 
alignment. 

The sub-tomogram averages of each Miller tree were individually inspected 
and the orientation of the Pol I enzymes on each Miller tree was analysed. The 
3/ to 5’ directionality of the enzymes on each Miller tree was analysed. If all 
enzymes had the same directionality (that is, the signal-to-noise ratio was 
sufficient to align them properly), their sub-tomogram average was used for further 
processing. If the enzymes had conflicting directionalities (including complete 
random directionality), their sub-tomogram average was rejected. Five Miller trees 
qualified for this criterion. Their enzyme directionality was visualized compared to 
the Miller-tree directionality, and they all conformed. Finally, 225 enzymes (from 
the 993 total enzymes in the tomograms) of the five selected Miller trees were 
subjected to a refined sub-tomogram averaging and the resulting cryo-electron 
tomography structure reached a resolution of ~29 A with the FSC 0.5 threshold 
criterion (~31 A when compared to the single-particle cryo-EM structure). 
Additional cryo-electron tomography analysis. In the tomograms both the DNA 
and the RNA could be seen emanating from the enzymes (Extended Data Fig. 8c, d). 
They were manually localized as close as possible to the enzyme and subsequently 
sub-tomogram averaging was performed around this position. To obtain evidence 
for the RNA exit channel visualized in the single-particle cryo-EM structure, we 
made three independent attempts to manually select the position of exiting RNA 
on Pol I in the tomogram without prior knowledge of the structure (Extended 
Data Fig. 8e). The resulting point distribution of exiting RNA on the cryo-electron 
tomography structure agreed with the location of the RNA exit channel in the 
single-particle cryo-EM map and further confirmed the correct superposition of 
the two independent structures. 

The distances of consecutive Pol I enzymes were calculated as the Euclidian 
distance between their centre-to-centre positions. For plotting the probability density 
function, one enzyme was centred, the downstream enzyme was placed on the 
x axis, and the upstream enzyme was placed on the plane. Between three consecutive 
neighbouring enzymes, the in-plane angle was estimated. 


For fitting of structures to the cryo-electron tomography reconstruction, rigid 
body fitting of the cryo-electron tomography and the single-particle cryo-EM 
structures of the Pol I EC was performed automatically, using MATLAB scripts 
(MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick), 
implemented in the TOM package* (all scripts are freely available upon request) 
as well as Chimera**. This resulted in a global cross-correlation value of ~0.8 and 
a FSC shown in Extended Data Fig. 9a. The contour level for the cryo-electron 
tomography structure for volume rendering of our average was calculated from 
the theoretical molecular mass with an average protein density of 0.8 kDanm°. 
Data availability statement. Cryo-electron microscopy densities were deposited 
in the Electron Microscopy Data Base under the accession codes EMD-4147 and 
EMD-4148 for the EC and the free monomer, respectively. Sub-tomogram average 
densities were deposited in the Electron Microscopy Data Base under the accession 
codes EMD-4149. Model coordinates were deposited in the Protein Data Bank under 
the accession codes 5M3F and 5M3M for the EC and the free monomer, respectively. 
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Extended Data Figure 1 | Preparation of Pol I elongation complex (EC) 
for single-particle cryo-EM. a, Size-exclusion chromatogram (Superose 

6 Increase 3.2/300; GE Healthcare) of reconstituted Pol I EC. Higher 
absorbance at 260 nm (red line) than at 280 nm (blue line) indicates 
presence of nucleic acids. Coomassie-stained SDS-PAGE analysis of 
pooled peak fractions shows the presence of all 14 Pol I subunits. 

b, Coomassie-stained SDS-PAGE analysis of titration with BS3 cross- 
linker. Gel is cropped to large subunits A190 and A135. A shift to higher 
molecular weight is observed with increasing BS3 concentration indicating 
successful crosslinking. Based on interpolation, we chose 0.9 mM BS3 

(not shown) as the appropriate concentration for final sample preparation. 
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Extended Data Figure 2 | Single-particle cryo-EM particle sorting 
pipeline. Annotated arrows indicate the direction of processing and 
provide information regarding the number of particles used and the 
classification masks applied. A representative micrograph of the Pol I EC 
under cryo conditions showed particles of the expected size. A set of 1,500 
particles was picked manually with EMAN2 (ref. 30) and used to generate 
initial 2D classes for template based auto-picking in Relion*’. After 
cleaning by manual inspection and in 2D classification, per frame B-factor 
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particles 


401 k particles 
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weighting and translational movie alignment was applied to the remaining 
282,000 particles. The colouring of the surfaces is according to the standard 
polymerase subunit colouring: A190, grey; A135, wheat; A49, light blue; 
A43, slate; AC40, red; A34.5, pink; Rpb5, magenta; Rpb6, silver-blue; 
AC19, yellow; Rpb8, green; A14, hot pink; A12.2, orange; Rpb10, blue; 
Rpb12, lemon. Template DNA, non-template DNA and RNA are depicted 
in medium blue, sky blue and red, respectively. The structures against 
greyed background indicate final EC and Pol I monomer structures. 
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Extended Data Figure 3 | Quality of single-particle cryo-EM 


reconstructions. a, Top and bottom view of local resolution surface maps. 


b, Representative areas of the single-particle cryo-EM density for Pol I 
EC (left panel) and Pol I monomer (right panel). The A190 helix «19 


(upper panel) and the A135 strand 640 (lower panel) are depicted together 


with the refined model superimposed. c, Angular distribution of particle 
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images. Red dots indicate views with at least one particle assigned 

within 1°. Black shading represents the number of particles. The 
orientation occupancy is similar for both structures and covers most 

of the angles. d, FSC curves. Blue lines indicate the FSC between half 
maps of the respective reconstruction and red lines indicate FSC between 
the derived model against the single-particle cryo-EM map. 
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Extended Data Figure 4 | Comparison of all eukaryotic and the bacterial elongation complexes. All structures are depicted in front view. Bridge helix 
and active site are highlighted in green and magenta, respectively. Modules were defined as in Fig. 2d. a, Pol EC from this study. b, Pol II EC’. ¢, Pol III 
EC!°. d, Bacterial EC!°. 
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Extended Data Figure 5 | Additional details on Pol I EC. a, Cleft 

loops. Ribbon model of ECs of all three S. cerevisiae RNA polymerases 
superimposed on the bridge helix®'°. Bridge helix (green) and downstream 
DNA together with DNA-RNA hybrid (blue and red) are given for Pol I. 

b, Ribbon model of free Pol I (PDB code 4C2M (ref. 5), black and orange) 
superimposed on the shown inner A190 funnel helix «21 with Pol IEC 
(grey, green and pink). As a consequence of cleft contraction, parts of the 
shelf module move in and reduce the width of the pore to impair binding 


Orange: A12.2C of free Pol | 
Gray/green/pink: Pol | EC 


Shelf loo} 
Free Pol | . 
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of the C-terminal domain of A12.2. c, Modelling the A12.2 C-terminal 
domain into the pore of the contracted Pol I EC results in a clash. In the 
upper part, a surface representation of domains in free Pol I shows that 
the C-terminal domain of A12.2 fills the pore that is lined by the A190 
funnel helix «21 and loop 1572-1579 of the A190 cleft domain in the shelf 
module. In the lower part, cleft contraction observed in the EC reduces the 
width of the pore, causing a steric clash in the model. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Free monomeric Pol I single-particle cryo-EM 
structure. a, Ribbon model of free, monomeric Pol I solved by single- 
particle cryo-EM. The views correspond to the ‘front’ and ‘top’ views with 
the incoming downstream DNA pointing towards the reader. Colour 
code as in Extended Data Fig. 2. b, Free Pol I (PDB 4C2M, black), Pol I 
Monomer (pink) and Pol I EC (orange) were superimposed onto A135. 
For clarity, only subunits A190 and A135 are shown. c, Scheme of the 
observed conformations as displayed in b. The cleft width was measured 
at two positions: (i) between residue K434 from chain A135 to residue 
E414 from chain A190 and (ii) between residue K1331 from chain A190 
to residue G231 from chain A190. The distance bars above and below the 
polymerase cartoon indicate the distances between the protrusion and 
the clamp core helices (above) and at the entry site of downstream DNA 
(below). The difference (A) between the free Pol I (ref. 5) and the Pol I 
monomer, the Pol-I-Rrn3 (ref. 17) and Pol I EC, respectively, is given in 


LETTER 


brackets. For the measurement of the relative movement of the clamp core 
helices shown in the magnified inset, all Pol I structures were aligned on 
the A135 subunit and the distance between residue E414 of subunit A190 
of the free Pol I° to the same residue of the Pol I EC, Pol I - Rrn3!” and Pol 
I monomer, respectively, was measured. d, Electron density of the bridge 
helix in the free Pol I Monomer. e, Comparison of bridge helices in the free 
Pol I monomer (orange) and free Pol I (black) from the crystal structure. 

f, Electron density (semi-transparent grey) is shown together with models 
for the bridge helix, trigger loop (both grey) and the C-terminal domain 
of A12.2. The expander (red) is not present in this structure but modelled 
here based on the crystal structure of the free Pol I dimer, revealing a clash. 
g, Inhibited** and paused”? bacterial polymerase superimposed on Rpb1 
of the free Pol I monomer. For clarity only A190 and A135 of Pol I are 
shown and the 8’-NCD of the bacterial polymerases is excluded from the 
visualization. 
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Extended Data Figure 7 | Yeast cells, lysed to leak their nucleoplasm, leakage. b, Electron micrograph of the spread nucleoplasm of a plunge- 
prepared with negative stain and visualized under cryo conditions. frozen yeast cell at close-to-native conditions. In the lower left corner, 
a, Electron micrograph of a negatively stained lysed yeast cell, with the the remains of a yeast cell can be seen as an electron-dense patch. The 
nucleoplasm spread on the carbon support film. The upper left of nucleoplasm is embedded in an ice layer and the asterisks indicate three 
the micrograph is occupied by the grid bar. The yeast cell has released Miller trees found in the vicinity of this cell. The Miller tree indicated with 
the nuclear context on the grid, which appears as an electron-lucent the red asterisk was used for recording of the tilt-series in Fig. 3. 
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Extended Data Figure 8 | Relative positions of polymerases towards 
each other and of protruding nucleic acids. a, Histogram of centre- 
centre distance d of two consecutive Pols as depicted in (Fig. 3c). The 
reason for measuring distances smaller than 12 nm is that the enzyme 
is not completely spherical. Thus at certain rotational arrangements the 


centre-to-centre distance can be smaller than the average diameter of Pol I. 


b, Histogram of in-plane angle spanned by three consecutive Pols as 
depicted in Fig. 3c. c, Focused sub-tomogram averaging around the RNA. 
The RNA exits Pol I as an ~10A thick density, both in the slice and in the 
isosurface representation. d, Sub-tomogram average with the alignment 


RNA Localization 


focused on the downstream DNA. The downstream DNA is a long, 
straight 2 nm density, both in the slice and in the isosurface representation. 
In both c and d, the Pol I molecule is a globular ~12 nm featureless 
density. e, Stereo pair of the sub-tomogram average shows the positions of 
the nascent RNA chain as green balls. The positions that were manually 
identified by three independent users, without previous knowledge of the 
positions of the sub-tomogram average, correspond closely to the position 
of the RNA exit site that was postulated by the X-ray crystallography 
structure?. 
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Extended Data Figure 9 | Comparisons between cryo-electron 
tomography and single-particle cryo-EM structures. a, FSC of the 
cryo-electron tomography structure with a resolution of 29 A (purple line) 
and FSC between the cryo-electron tomography structure and single- 
particle cryo-EM structure with estimated resolutions of 44 A and 31 A, 
measured at FSC 0.5 and 0.143 criteria, respectively (green line). b, Poor 
fit of the expanded, free Pol I crystal structure* (PDB: 4C2M) to the 


Reproduction of Figure 3d (for side-by-side comparison) 


Pol | EC 


Back view 


Free Pol | (4C2M) 


cryo-electron tomography density (grey). In the expanded state a 
significant part of the clamp core helices are outside the cryo-electron 
tomography density (26% outside, 74% inside), while in the contracted 
state they almost completely enclosed (4% outside, 96% inside). In 
addition, in the expanded state 54% of the cryo-electron tomography 
density remains unoccupied compared to 19% of the contracted state 
(also compare Fig. 3d, reproduced here for comparison purposes). 
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Extended Data Table 1 | Model refinement statistics 


Pol | monomer Pol | EC 
Map CC (whole unit cell) 0.850 (0.844) 0.728 (0.724) 
Map CC (around atoms 0.703 (0.739 0.733 (0.763 
rmsd (bonds) 0.008 (0.008) 0.007 (0.007) 
rmsd (angles) 1.00 (0.86) 0.929 (0.864) 
All-atom clashscore 32.11 (26.42 13.72 (10.7 


Ramachandran plot 


outliers 0.4 % (0.2 %) 0.4 % (0.1 %) 

allowed 4.9 % (4.6 %) 6.6 % (6.5 %) 

favored 94.7 % (95.2 %) 93 % (93.4 %) 
Rotamer outliers 1.0 % (0 %) 1.1% (0 %) 
C-beta deviations 1(0 1(0 
EMRinger score 0.65 (0.72) _ 2.74 (2.93) 
Molprobity score 2.56 (2.24) 2.12 (1.98) 


Statistics for the core of Pol | excluding Al4—A43 and A34.5-A49 are provided in parenthesis. 
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Scientists need to step forward if they are to ensure that politicians understand the importance of their work. 


SCIENCE ADVOCACY 


Get involved 


Presenting science to politicians in a way they can understand can have good outcomes. 


ith so many science-based challenges 
facing the world, researchers 
who can help to inform and affect 


policy can have an outsized impact. We asked 
Connie Lee, Tamara Galloway and Niklas 
Héhne to describe how they have helped to 
shape government policy — and how others 
can learn from their experiences. 

As chair of the public-policy committee for 
the American Society for Cell Biology (ASCB), 
Lee is a prominent advocate for science. She 
studied mammalian mitochondria before 
becoming an editor of The EMBO Journal anda 
deputy editor of Cell. As assistant dean for basic 
science at the University of Chicago in Illinois, 
she helps to oversee nine science departments. 

After training as a physicist, Hohne turned 


his attention to climate change, a field in which 
he hoped to make a global difference. As a 
founding partner of the New Climate Insti- 
tute in Cologne, Germany, and a professor 
of greenhouse-gas mitigation at Wageningen 
University in the Netherlands, he works at the 
intersection of science and policy. 

Galloway, an ecotoxicologist at Exeter 
University, UK, can say with certainty that her 
research — and her advocacy — have brought 
real-world results. Her testimony in front of 
Parliament in May helped to bring about a 
UK ban on microplastics in personal-care 
products, an important source of marine pol- 
lution. In June, she discussed her research on 
pollutants in front of a committee of the United 
Nations in New York City. 


Connie Lee, assistant 
dean for basic science 
at the University of 
Chicago in Illinois 


ae 


, 


Scientists have a lot of demands on their time. 
But getting involved in policy and advocacy 
is extremely important. Politicians hear from 
many lobbyists. If they don’t hear from scientists 
too, we might be left out. 

I got bitten by the policy bug in 2008 } 
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> when I visited Capitol Hill, the home of the 
US Congress, as a representative of the ASCB. 
My dream is for every US scientist to visit 
Capitol Hill — you never know what questions 
politicians are going to ask. We met staff and 
elected members of Congress, and they had 
so many misunderstandings about science. A 
lot of people think that National Institutes of 
Health (NIH) funding only affects people in 
Bethesda, Maryland, where the NIH is head- 
quartered. They don't realize that the funding 
spreads out to all 50 states, supporting research 
and creating jobs. 

The lack of scientific understanding among 
policymakers can be frustrating. Members of 
the US House of Representatives will scan titles 
of NIH grants to find items that sound wasteful. 
These grants have been peer reviewed, but the 
politicians just look at the titles and take them 
out of context. It can be important work, but it’s 
mocked and dismissed. We can't let that sort of 
thing get us down. 

There’s a communication gap between 
scientists and politicians. Scientists have to 
learn to explain the importance of their own 
work, whether they’re talking to a policy- 
maker, a dean or a potential donor. But we 
have to share a bigger message, too. We need 
to advocate for the institution of science and 
the importance of funding basic science. You 
never know where basic research can lead. The 
methodology behind CRISPR was discovered 
by looking at how bacteria protect themselves. 
Nowit's used to edit genomes. 

The ASCB lobbies for issues outside the lab, 
such as immigration and the importance of 
international collaboration. We want to make 
sure junior researchers from other countries 
receive visas that last long enough to allow 
them to get the training they need. 

Policy and advocacy can take as much time 
as youre willing to give it. A lot of scientific 
societies have outreach positions, which 
is a great place to get started. You can join a 
government-relations board at your univer- 
sity or just offer a tour of your lab whenever 
a politician visits. And when you do get a 
grant funded, write to your local senator or 
representative and thank them for supporting 
science. It’s baby steps, but we need to build 
relationships so they can see us as a resource. 


THE FACTS 
MATTER 


Niklas Hohne, 
climate scientist 
at Wageningen 
University, the 
Netherlands 


Science covers the questions at the heart of 
society's problems. When it comes to cli- 
mate change, it’s absolutely essential that the 


research community helps to translate sci- 
ence into options for policymakers. 

I study international climate negotia- 
tions, such as the Paris agreement of 2015. 
The stated aim of the agreement was to limit 
warming to 1.5°C above pre-industrial lev- 
els. It requires much analysis to look at each 
country’s emission proposals and then add 
them up to see whether they are on track 
to meet the overall goal. As I reported in 
November at the climate-change confer- 
ence in Marrakesh, Morocco, our models 
show that some countries’ current emis- 
sions proposals aren't sufficient to reach 
the Paris goal. Policymakers need this 
information so that they can adjust their 
country’s emission targets, if they have the 
will to do so. 

I would say that most governments are 
generally well-informed about climate 
change. The goal to limit warning to 1.5°C is 
stronger than the previous one of 2°C, and 
that’s because politicians understood the 
evidence. Scientists were able to show that a 
2°C rise wouldn't be safe for the planet. 

Some politicians, including the president- 
elect of the United States, have denied that 
climate change exists. If individual politi- 
cians don't want to be convinced, there’s not 
much more that scientists can do. Still, it’s 
important to keep gathering data and reach- 
ing out to policymakers and the general pub- 
lic. The scientific community has a duty to 
continue to provide evidence and explain 
what we really know about human-caused 
emissions and global temperatures. 

Every 6 years, for example, about 
2,000 researchers work together to create a 
report for the Intergovernmental Panel on 
Climate Change on the current situation. 
It is a technical report that most politicians 
would have trouble understanding. But sci- 
entists can explain the key points and the 
take-home messages. Without that transla- 
tion, their research won't have much of an 
impact. 

In some parts of society, we seem to be 
moving to an era beyond factual argument. 
Emotion seems to matter more than the 
facts. We have total access to information, 
but we also have total access to misinforma- 
tion. Scientists have to make the facts mat- 
ter again. We can do that by communicating 
results in an accessible way. 

I have been doing this for 20 years, and 
there have been a lot of setbacks. But I’m still 
hopeful. I got into climate change because I 
wanted to have an impact on the world, and 
I still think climate scientists can accomplish 
that. Despite the challenges, controlling 
climate change is doable. We have the tech- 
nology we need to reach the goals. My main 
motivation is to give politicians the tools that 
they need to get this right. Progress may be 
slower than I hoped, but we'll see how things 
work out. 
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DELIVER YOUR 
MESSAGE 


Tamara Galloway, 
ecotoxicologist at 
Exeter University, UK 


The UK government is full of people who used 
to be bankers and lawyers. There's a great lack 
of scientific understanding. Most scientists 
arent interested in becoming politicians, but 
it’s still possible for them to become involved 
and inform policy. 

As ascientist, I always wondered why it took 
government so long to act on issues, especially 
when the evidence was already clear. Then, in 
November 2015, I participated in the Royal 
Society’s “Week in Parliament’ scheme, and 
spent a week in London shadowing a Mem- 
ber of Parliament (MP). It was an amazing 
experience, and it helped me to understand 
that government is a giant monolith. Change 
comes slowly — in most cases. 

In May, I and two other environmental 
scientists addressed a parliamentary select 
committee of MPs on marine pollution caused 
by microplastics, spheres of plastic less than 
5 mm in diameter. The committee had three 
hours to ask us any question they wanted, and 
we didn't know what to expect. I felt like Id had 
been called into the headmaster’s office. 

The committee members asked intelligent 
and well-informed questions, trying to put 
everything in context. There’ a lot of hysteria 
on the topic and websites with false informa- 
tion, so I needed to provide impartial scien- 
tific evidence. You don’t want to sound as if 
you're pushing an agenda. I explained that 
microplastics, which are often found in cos- 
metics and shampoos, aren't actually toxic, but 
that they can disrupt the feeding and repro- 
duction of many marine organisms. 

Shortly after the hearing, the committee 
announced that microplastics will be banned 
from personal-care products in the United 
Kingdom by the end of 2017. The science we 
were doing had had a real impact, and I was 
amazed that it happened so quickly. 

The pinnacle of my policy outreach — so 
far — is when I spoke about my research in 
front ofa United Nations panel in New York in 
June. After that, I felt I could tackle anything. 

The lesson for me is that we must speak 
up. Scientists tend to become more and more 
specialized, to the point where it can be dif- 
ficult to talk to other researchers, let alone the 
general public. I use my children as a sounding 
board. If they understand, I know I’m ready. m 


INTERVIEWS BY CHRIS WOOLSTON 
These interviews have been edited for clarity and 
length. 


Ua SCIENCE FICTION 


BREATHE THE LAST BITS OF AIR 


BY EMILY MCCOSH 


Are there stars outside? 

Are you actually asking me that? Say it again. 
Ask me again, if you're serious. 

Ican ask again, I’m not afraid. Are there stars 
outside any more? 

You should be afraid. And there may be. 
Well, go look. 

How do you think I should go about doing that? 
Look out of the window, genius. 

There are no windows... there is no glass. 
There must be windows. How do you look 
outside without windows? 

There'vre no windows. And there’ no looking 
outside. Anyhow, if there were windows to look 
out of, thered be nothing to look outside at. 
What kind of statement is that? I told you, 
look at the stars. 

There are no stars. They’ve all winked out of 
existence. One. By. One... 

Utter rot. 

Don’t mock me. 

If there was anything left in the Universe 
worth mocking, it would be you. 

One more insult, and I swear, I'll leave you 
all alone. 

You won't. You have nowhere to leave to. No 
path to walk upon. We're stitched together. 
Anyhow, how do you know the stars have gone 
out when there are no windows to look out of? 
Ive opened the door. 

Then open the door and look at the stars. 
Id rather not. 

I said go look at the stars! You never do as I 
ask. I can’t even look for myself. My eyes are 
gone. They're whiter than milk — 

There’s no more milk. 

— they’re larger than galaxies — 

There are no more galaxies. No more stars. I 
told you that. 

— they're brighter than stars. They must be. 
Have you ever tried looking at my eyes while 
I'm sleeping? I think you must have. They 
must be brighter than stars. 

(She reaches out, lets her fingers hover over 
his eyes, but pulls her hand back.) 

I’ve never looked at your eyes. Promise. 
Never? 

Never. I’ve told you that. Listen. You're stub- 
born and you never listen to me. Why don't I 
leave you when you never listen to me? 

Well, maybe you love me? 

Maybe? What kind of word is ‘maybe’? I either 
love you or I don't. 

Well? 

Well what? 

Do you? 


Endgame. 


Do I what? 


Love me? 

Probably not. 

(There is silence like the end of the world.) 
Are you there? 

I'm here. 

What were you saying? 

Iwas saying I probably don't love you. I might 
have at one point but I probably don't now. 
I wish I did, but I don’t think there is such a 
thing as love any more. It caught the last train 
out of here. When all the people ran away to 
the dying stars, it went right along with them. 
It’s gone. It has died. Just like the Sun died. 
The Sun can’t die. It’s the Sun. 

It’s grey. 

Grey? 

Grey. Less than black. More than white. Grey 
as slate. 

Is that what love is like now? 

I think so. I can't be sure. I can't feel my heart. 
Can I feel it? 

(He reaches out, lets his hand hover over her 
chest. It doesn’t come close enough to touch.) 
Id rather you didn't. 

(Withdraws his hand.) 

Id rather I did. 

It doesn't matter. Its my heart. Even if I can't 
feelit. 

I could, you know? If 1 wanted to. IfI really 
wanted to, youd let me. 

You couldn't. You can't even see me. 

Can you see me? 

Not hardly. 

Well, I think you're right. 

About what? 

About love like the dead Sun. 

I probably am. It’s a shameful thing to be right 
about. 

Is it? 

(He thinks, and laughs.) 

Maybe it’s better to be right about it than to 
be wrong and think with all your heart that 
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youre right. If you could feel your heart, that 
is... Are there stars outside? 

There may be. 

Let’s go see. 

There’ nothing left outside... Anyhow, you 
cant walk. 

I know. You can carry me. Take me to the 
window. Let’s go outside. We can use the 
door. You can carry me. 

Icantt. 

I know you can. You shouldn't lie. 

Don't call me a liar. I'll leave you. 

You should leave me. 

I should. 

Let’s go outside. 

(They go outside, both of them.) 

Describe it to me. 

Well, it’s nothingness. 

Is that it? 

It is. There’ no description for nothingness. 
It’s like sleeping, the world is sleeping... Let 
me set you down. 

Yes ... Yes, do. Don't leave me though. 

I won't. 

Iknow... Youcan look at my eyes, if youd like. 
Will I like it? 

I don't know. 

(She crawls into his lap, opens his eyes and 
lifts his face.) 

Well, what do you know? They are brighter 
than stars. 

Are they? 

Yes, more beautiful too. But just as sad. 

Ah. 

(Kisses his lips, gently.) 

I suppose... you can feel my heart now... if 
youd like. 

I would... 

Then why hesitate? 

I don't know... 

(He wraps his arms around her waist, lays his 
cheek against her breast.) 

Well, it’s unsteady. 

Yes, well, that has to do with you, not the 
world splintering to bits. 

Oh. 

(Sighs) 

Now? 

Now, what? 

Now what do we do? 

Now? Breathe the last bits of air, I suppose. 
(Sighs) 

Breathe. = 


Emily McCosh writes and daydreams from 
her home in southern California. Visit her 
online at oceansinthesky.com or 
@wordweaveremily 
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